GENOME-ENABLED PREDICTION RIDGE REGRESSION GENOMIC BLUP ROBUST METHODS: TMAP AND LMAP
|
|
- Dortha Lamb
- 5 years ago
- Views:
Transcription
1 GENOME-ENABLED PREDICTION RIDGE REGRESSION GENOMIC BLUP ROBUST METHODS: TMAP AND LMAP
2 RIDGE REGRESSION THE ESTIMATOR (HOERL AND KENNARD, 1979) WAS DERIVED USING A CONSTRAINED MINIMIZATION ARGUMENT Recirocal of Lagrange multilier ( regularization arameter in machine learning) Exected value Bias Covariance matrix
3 NOTE: THE INTERCEPT IS NOT REGULARIZED
4 WAYS IN WHICH RIDGE REGRESSION CAN BE INTERPRETED As shrunken estimator of regressions or marker effects (ridge) As redictor of random effects (BLUP) [THIS IS A CRYPTIC INTERPRETATION BECAUSE WE WISH TO LEARN GENE EFFECTS : these do not vary at random, but over a concetual distribution] As mean of a conditional osterior distribution (Bayes) As maximum of a enalized likelihood under the L2 norm (PMLE) In all cases we need ω OR variance ratio (and of the individual variances for interval inference)
5 BLUP
6
7
8 Bayes IN THE BAYESIAN INTERPRETATION, β HAS A TRUE, FIXED VALUE. THE RANDOMNESS REPRESENTS UNCERTAINTY PRIOR AND POSTERIOR TO OBSERVING DATA
9 RIDGE REGRESSION EXAMPLE 1
10 RIDGE REGRESSION EXAMPLE 2
11
12
13
14 ASSIGNING A VALUE TO ω: GENERALIZED CROSS-VALIDATION VARIANCE COMPONENTS+ BRUTE FORCE:
15 GCV: MOTIVATION The residual SS will be called T(λ)
16 SAME TRAIT IN 4 DIFFERENT ENVIRONMENTS: FOUR DIFFERENT BEHAVIORS!
17 GENOMIC BLUP
18
19 RECALL Genotyes (random variable W denotes genotye at a locus) W aa 1 W Aa 0 W AA 1 E HW W 2 q 2 q Var HW W E X 2 E 2 X 2 q 2 q 2 2q W aa 0 W Aa 1 W AA 2 E HW W 2 2 2q 2 q 2 Var HW W 4 2 2q 4 2 2q Coding does not affect the variance of genotyes but mean shifts 2 q 1 Deviations from means are invariant to this tye of coding W E W Coding 1 1 q 1 q 2 0 q q q 1 q 2 1 W E W Coding
20 A LOOK AT VAN RADEN S GENOMIC RELATIONSHIP MATRIX x 11. x 1 X ind,marker x 21. x 2... x n1. x n XX x 11. x 1 x 21. x 2... x n1. x n x 11 x 21. x n1.... x 1 x 2. x n 2 x 1j x 1j x 2j x 2 2j. x 1j x nj 2 x nj
21 In Van Raden s G-matrix : If all elements of G(VR) are divided by this factor, then scale is consistent with A. E E x ij x 1 j x 2 j 2 Var x ij E 2 x ij 2 j q j j q j j q j Cov x 1 j x 2j 2 ij j q j j q j 2 2 j q j E x 1 j E x 2 j 2 j q 2 j 2 j q j 1 Cov x 1j x 2 j 2 j q 2 j 2 j q j 1 j q j 2 2q Or if x s centered additive relationshi Note: LD does not enter into this form of genomic relationshi matrix
22 UNDER HARDY-WEINBERG AND IDEALIZED CONDITIONS E XX 2 j q j j q j 2 Symmetric a 12 2 j q j j q j 2.. a 1n 2 j q j j q j 2 Additive relationshis a 2n... a n,n 1 2 j q j j q j 2 2 j q j j q j 2 2 j q j j q j 2 2 j q j j q j 2
23 Likewise, if the x s are centered 1 a 12.. a 1n E X E X n X E X n 2 j q j Symmetric 1 a 2n... a n,n 1 1 A 2 j q j E X E X n X E X n 2 j q j A A= n x n matrix of additive relationshis
24 Then, the genomic relationshi matrix G X E X X E X 2 j 1 j X X V M,HW Is the realization of a rocess. If this rocess is the HW rocess, then its exectation is E X E X n X E X n 2 j q j A For examle: arent and offsring are exected to have a relationshi=0.5 but in reality it could be larger or smaller
25 MANY G-MATRICES (each may rovide a different variance comonent decomosition) Examles G VR 1 X cent X cent 2 j 1 q j G ST 1 X stdx std ; X std x ij x j Var x ij G 1 2 G VR G ST followed by some scaling? G W 1 X stdwx std where W uses LD information? G Blend G ST 1 A after some re-scaling of matrices G VR scaled in (0,2) with maminimax function? x xmin y ymin ymax ymin xmax xmin g ij,vr min g ij,vr g ij 2 max g ij,vr min g ij,vr
26 FOCUS: GWAS FOCUS: HERITABILITY FOCUS: KINSHIP
27 MARKERS ARE NOT QTL: a disconnect
28 QTLs in LD with markers QTLs in LE with markers
29 Off-diagonals of Genomic correlations among 500 individuals MARKERS OBSERVABLE UNOBSERVABLE QTLs
30 BACK TO BASIC SETTING: linear regression on markers
31 BLUP Cov, y Var 1 y y E y BRUTE FORCE 1 2 X XX 2 I e 2 1 y BRUTE FORCE 2 X XX 1 I XX 1 e y BRUTE FORCE: invert n x n and then ma onto x n MME: invert x NO COMPELLING REASON FOR MME HERE
32 MARKED GENOTYPIC VALUE SAME RESULT: BOTH n x n COMPUTATIONS REQUIRED
33 E y, variance comonents Estimate marker effects from genomic BLUP? Use standard BLUP theory under normality! E y E g y E g "ITERATED EXPECTATIONS" g E X y, variance comonents under normality E X E Cov, X Var X 1 X E X 0 2 X XX X E X y E X, y E X y X XX 1 X y X XX 1 E X y X y X XX 1 g E y, variance comonents X XX 1 I XX 1 2 e e /V M,HW 1 y [REMEMBER THIS]
34 BRUTE FORCE DEFINITION: BLUP is a conditional exectation under normality E y, variance comonents Cov, X XX 2 I e 2 1 y 2 X XX 2 I e 2 1 y 2 X XX 1 2 XX 1 e 2 1 y X XX 1 I XX 1 e y [REMEMBER?] CAN GO BACK AND FORTH BETWEEN GENOMIC BLUP AND RIDGE REGRESSION ESTIMATES OF MARKER EFFFECTS X XX 1 g g X
35 BACK TO GENOMIC BLUP When should a secific reresentation of GBLUP be used? Suose <n. Then G has at most rank=n and the inverse of G does not exist rm(list=ls(all=true)) ###LOAD LATTICE AND MATRIX library(mass) library(bglr) library(lattice) library(matrix) set.seed( ) ####GVR= genomic relationshi a la Van Raden (2008) GVR<-X%*%t(X)/varHW ar(mfrow=c(2,1)) vecgvr<-as.vector(gvr) hist(gvr,main="distribution of elements of GVR",xlab="GVR values") diaggvr<-diag(gvr) lot(diaggvr,ylab="diagonal values",main="diagonal values of GVR") ar(mfrow=c(1,1)) ###LOAD DATA data(wheat) Y<-wheat.Y X<-wheat.X y<-y[,1] n<-length(y) X<-X[,1:50] freq<-numeric(ncol(x)) for (j in 1: ncol(x)){ freq[j]<-mean(x[,j]) } X<-scale(X, center = TRUE, scale = FALSE) Frequency Distribution of elements of GVR > summary(vecgvr) Min. 1st Qu. Median Mean 3 rd Qu. Max > summary(diaggvr) GVR values Min. 1st Qu. Median Mean 3rd Qu. Max ###Markers are binary so var of marker codes is (1-) ###instead of 2(1-) er locus Diagonal values of GVR varhw<-sum(freq*(1-freq)) varhw [1] Diagonal values Index ISSUE HERE: SCALE DIFFERS FROM THAT OF A!
36 Calculation of GBLUP (once one has arrived at some G) g E g Cov g, y Var y 1 y E y G 2 g G 2 g I 2 e 1 y G G I e y g I G 1 e 2 g 2 1 y Var g y Var g g G 2 g G 2 g G 2 g I 2 e 1 2 G g G g 2 G G I e 2 G g 2 I G 1 e 2 I G 1 e 2 g 2 1 I G 1 2 e e g g 2 1 G g 2 g 2 1 G g 2 I G 1 2 e G 2 g 2 2 G g g
37 ####Does GVR have an inverse in this case? No, rank(gvr) should be 50 > GVRinv<-chol2inv(chol(GVR)) Error in chol2inv(chol(gvr)) : error in evaluating the argument 'x' in selecting a method for function 'chol2inv': Error in chol.default(gvr) : the leading minor of order 38 is not ositive definite > rankmatrix(gvr) Warning in rankmatrix(gvr) : rankmatrix(<large sarse Matrix>, method = 'tolnorm2') coerces to dense matrix. Probably should rather use method = 'qrlinpack'!? [1] 50 attr(,"method") [1] "tolnorm2" attr(,"usegrad") [1] FALSE attr(,"tol") [1] e-11 MUST USE STRONG ARM FOR CALCULATING GBLUP g GVR G VR I e 2 g 2 1 y Var g g G VR g 2 G VR G VR I e 2 I G VR G VR I e 2 g 2 1 G VR g 2 g 2 1 G VR g 2
38 Suose 2 g 0.30; e g ####Comute GBLUP using the strong-arm method. ####varg=0.30,vare=0.70 ####lambda< varg=0.30 vare=0.70 lambda=vare/varg Vstar<-(GVR+lambda*diag(n)) Vstarinv<-chol2inv(chol(Vstar)) ghat<-gvr%*%vstarinv%*%y lot(ghat,ylab="gblup",main="genomic BLUP (Van Raden G) varg=0.30 vare=0.70") ###Comute rediction error variance covariance matrix PEVMAT<-varg*(diag(n)-GVR%*%Vstarinv)%*%GVR ###CALCULATE MODEL DERIVED RELIABILITIES RELS<-varg*diag(n)-diag(PEVMAT)/varg RELGBLUPS<-diag(RELS) lot(relgblups,ylab="rel",main="reliabilities of G-BLUP (Van Raden G)") Genomic BLUP (Van Raden G) varg=0.30 vare=0.70 Reliabilities of G-BLUP (Van Raden G) GBLUP REL No evidence of overfit Index Index
39 #####Imact of G-matrix on GBLUP #####Assume same variance decomosition #####Scale to be in (0,2) GVscaled<-matrix(nrow=nrow(X),ncol=nrow(X)) VRmin<-min(GVR) VRmax<-max(GVR) VRmin VRmax for (i in 1:nrow(X)){ for (j in 1:nrow(X)){ GVscaled[i,j]<-2*(GVR[i,j]-VRmin)/(VRmax-VRmin) } } Frequency Histogram of A scaled A Histogram of GVR scaled in (0,2) ####How does it comare with A? A<-wheat.A ar(mfrow=c(2,1)) hist(a,main="histogram of A scaled") hist(gvscaled,main="histogram of GVR scaled in (0,2)") Frequency GVscaled ar(mfrow=c(1,1)) cor(as.vector(a),as.vector(gvscaled)) [1]
40 #####BLUP (assume save var decomosition) varg=0.30 vare=0.70 lambda=vare/varg VstarGVS<-(GVscaled+lambda*diag(n)) VstarinvGVS<-chol2inv(chol(VstarGVS)) ghatgvs<-gvscaled%*%vstarinvgvs%*%y ghatgvs BLUP A vs GBLUP GVS VstarA<-(A+lambda*diag(n)) VstarinvA<-chol2inv(chol(VstarA)) ghata<-a%*%vstarinva%*%y ghat ar(mfrow=c(3,1)) lot(ghata,ghatgvs,main="blup A vs GBLUP GVS") lot(ghata,ghat,main="blup A vs GBLUP GVR") lot(ghat,ghatgvs,main="gblup GVR vs GBLUP GVS") ar(mfrow=c(1,1)) > cor(ghata,ghat) [,1] [1,] > cor(ghata,ghatgvs) [,1] [1,] > cor(ghat,ghatgvs) [,1] [1,] ghatgvs ghata BLUP A vs GBLUP GVR ghata GBLUP GVR vs GBLUP GVS msea<-sum((y-ghata)**2)/n msehat<-sum((y-ghat)**2)/n msehatgvs<-sum((y-ghatgvs)**2)/n msea msehat msehatgvs > msea [1] > msehat [1] > msehatgvs [1] ghat BLUP(A) FITS BETTER (may redict worse)
41 GENERALIZED CV IN GBLUP (zero-means model, wheat data)
42
File S1: R Scripts used to fit models
File S1: R Scripts used to fit models This document illustrates for the wheat data set how the models were fitted in R. To begin, the required R packages are loaded as well as the wheat data from the BGLR
More informationLecture 8 Genomic Selection
Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection
More informationGeneral Linear Model Introduction, Classes of Linear models and Estimation
Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)
More informationSupplementary Materials
Supplementary Materials A Prior Densities Used in the BGLR R-Package In this section we describe the prior distributions assigned to the location parameters, (β j, u l ), entering in the linear predictor
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationPackage BGGE. August 10, 2018
Package BGGE August 10, 2018 Title Bayesian Genomic Linear Models Applied to GE Genome Selection Version 0.6.5 Date 2018-08-10 Description Application of genome prediction for a continuous variable, focused
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationMIXED MODELS THE GENERAL MIXED MODEL
MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted
More informationLecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013
Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationHotelling s Two- Sample T 2
Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationUse of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek
Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analysis of Variance and Design of Exeriment-I MODULE II LECTURE -4 GENERAL LINEAR HPOTHESIS AND ANALSIS OF VARIANCE Dr. Shalabh Deartment of Mathematics and Statistics Indian Institute of Technology Kanur
More informationEnhancing Genome-Enabled Prediction by Bagging Genomic BLUP
Enhancing Genome-Enabled Prediction by Bagging Genomic BLUP Daniel Gianola 1,,3 *, Kent A. Weigel, Nicole Krämer 4, Alessandra Stella 5, Chris-Carolin Schön 4 1 Department of Animal Sciences, University
More informationINTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs
INTRODUCTION TO ANIMAL BREEDING Lecture Nr 3 The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs Etienne Verrier INA Paris-Grignon, Animal Sciences Department
More informationSection 4.4 Z-Scores and the Empirical Rule
Section 4.4 Z-Scores and the Empirical Rule 1 GPA Example A sample of GPAs of 40 freshman college students appear below (sorted in increasing order) 1.40 1.90 1.90 2.00 2.10 2.10 2.20 2.30 2.30 2.40 2.50
More informationGBLUP and G matrices 1
GBLUP and G matrices 1 GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ To refer breeding values to an average value of 0, we adopt the centered coding for genotypes described
More informationThe Poisson Regression Model
The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle
More informationProportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power
Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion
More informationSupplemental Information
Sulemental Information Anthony J. Greenberg, Sean R. Hacett, Lawrence G. Harshman and Andrew G. Clar Table of Contents Table S1 2 Table S2 3 Table S3 4 Figure S1 5 Figure S2 6 Figure S3 7 Figure S4 8 Text
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012
Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationLecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013
Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model
More informationBiostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression
Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Resonse) Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for
More informationA Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression
Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi
More informationGENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)
GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationAllele Frequency Estimation
Allele Frequency Estimation Examle: ABO blood tyes ABO genetic locus exhibits three alleles: A, B, and O Four henotyes: A, B, AB, and O Genotye A/A A/O A/B B/B B/O O/O Phenotye A A AB B B O Data: Observed
More informationPrediction of genetic Values using Neural Networks
Prediction of genetic Values using Neural Networks Paulino Perez 1 Daniel Gianola 2 Jose Crossa 1 1 CIMMyT-Mexico 2 University of Wisconsin, Madison. September, 2014 SLU,Sweden Prediction of genetic Values
More informationVariance Component Models for Quantitative Traits. Biostatistics 666
Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond
More informationBayesian Spatially Varying Coefficient Models in the Presence of Collinearity
Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory
More informationSTA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2
STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationCombining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)
Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment
More informationOne-way ANOVA Inference for one-way ANOVA
One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA
More informationBayes Estimators & Ridge Regression
Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss
More informationBiostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression
Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout 4a): χ 2 test of
More informationNumerical Linear Algebra
Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and
More informationASSOCIATION ANALYSES of the MAS-QTL DATA SET using GRAMMAR, PRINCIPAL COMPONENTS and BAYESIAN NETWORK METHODOLOGIES
OSL ASSOCATO AALYSS of the MAS-QTL DATA ST using GAMMA, PCPAL COMPOTS and BAYSA TWOK MTODOLOGS Burak Karacaören, Tomi Silander, José M. Álvarez- Castro, Chris S. aley, Dirk Jan de Koning OSL STTT and (D)SVS,
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationRidge and Lasso Regression
enote 8 1 enote 8 Ridge and Lasso Regression enote 8 INDHOLD 2 Indhold 8 Ridge and Lasso Regression 1 8.1 Reading material................................. 2 8.2 Presentation material...............................
More informationOverview. Background
Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems
More informationIntroduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)
Introduction to MVC Definition---Proerness and strictly roerness A system G(s) is roer if all its elements { gij ( s)} are roer, and strictly roer if all its elements are strictly roer. Definition---Causal
More informationPrincipal Components Analysis and Unsupervised Hebbian Learning
Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material
More informationANOVA, ANCOVA and MANOVA as sem
ANOVA, ANCOVA and MANOVA as sem Robin Beaumont 2017 Hoyle Chapter 24 Handbook of Structural Equation Modeling (2015 paperback), Examples converted to R and Onyx SEM diagrams. This workbook duplicates some
More informationPackage LBLGXE. R topics documented: July 20, Type Package
Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationIntroduction to Confirmatory Factor Analysis
Introduction to Confirmatory Factor Analysis Multivariate Methods in Education ERSH 8350 Lecture #12 November 16, 2011 ERSH 8350: Lecture 12 Today s Class An Introduction to: Confirmatory Factor Analysis
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationFE FORMULATIONS FOR PLASTICITY
G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationLecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012
Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed
More informationA Time-Varying Threshold STAR Model of Unemployment
A Time-Varying Threshold STAR Model of Unemloyment michael dueker a michael owyang b martin sola c,d a Russell Investments b Federal Reserve Bank of St. Louis c Deartamento de Economia, Universidad Torcuato
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationChapter 13 Variable Selection and Model Building
Chater 3 Variable Selection and Model Building The comlete regsion analysis deends on the exlanatory variables ent in the model. It is understood in the regsion analysis that only correct and imortant
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationOutline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution
Outline for today Maximum likelihood estimation Rasmus Waageetersen Deartment of Mathematics Aalborg University Denmark October 30, 2007 the multivariate normal distribution linear and linear mixed models
More informationTests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)
Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationBayesian construction of perceptrons to predict phenotypes from 584K SNP data.
Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic
More information10702/36702 Statistical Machine Learning, Spring 2008: Homework 3 Solutions
10702/36702 Statistical Machine Learning, Spring 2008: Homework 3 Solutions March 24, 2008 1 [25 points], (Jingrui) (a) Generate data as follows. n = 100 p = 1000 X = matrix(rnorm(n p), n, p) beta = c(rep(10,
More informationRecap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:
1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more
More informationarxiv: v1 [physics.data-an] 26 Oct 2012
Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 17. Bayesian inference; Bayesian regression Training == optimisation (?) Stages of learning & inference: Formulate model Regression
More informationFinite Mixture EFA in Mplus
Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationMultiple QTL mapping
Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power
More informationMODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES
MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationHypothesis Test-Confidence Interval connection
Hyothesis Test-Confidence Interval connection Hyothesis tests for mean Tell whether observed data are consistent with μ = μ. More secifically An hyothesis test with significance level α will reject the
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationBehavioral Data Mining. Lecture 7 Linear and Logistic Regression
Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationMethods for Cryptic Structure. Methods for Cryptic Structure
Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationp(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)
Multile Sequence Alignment Given: Set of sequences Score matrix Ga enalties Find: Alignment of sequences such that otimal score is achieved. Motivation Aligning rotein families Establish evolutionary relationshis
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationSupplementary Note on Bayesian analysis
Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan
More information