GWAS with mixed models

Size: px
Start display at page:

Download "GWAS with mixed models"

Transcription

1 GWAS with mixed models (the trip from 10 0 to 10 8 ) Yurii Aulchenko yurii [dot] Aulchenko [at] gmail [dot] com YuriiA consulting 1

2 Outline Methodology development Speeding things up Simplifying the math From math to software Conclusions & remarks 2

3 Methodology development Method (math) 3

4 Methodology development Data Method (math) 3

5 Methodology development Method (math) 4

6 Methodology development Algorithm Method (math) 4

7 Methodology development Implementation Algorithm Method (math) 4

8 Methodology development Data Implementation Algorithm Method (math) 4

9 Methodology development Fine! Data Implementation Algorithm Method (math) 4

10 Methodology development Fine! Data Wrong Implementation Algorithm Method (math) 4

11 Methodology development Fine! Data Wrong Implementation Algorithm Method (math) 4

12 Methodology development Fine! Data Wrong Implementation Algorithm Method (math) 4

13 Methodology development Fine! Data Wrong Too Implementation slow Algorithm Method (math) 4

14 Methodology development Fine! Data Wrong Too Implementation slow Algorithm Method (math) 4

15 Mixed Models for GWAS Natural way to model correlated data Model the distribution of phenotypes as y i =!+" g i +G i +# i, where " is the effect of a SNP, G is distributed as multivariate normal with VC-matrix proportional to the relationship matrix Parameters: {!, ", h 2, $ 2 } ML way: apply LR to test significance of " 5

16 Mixed Models for GWAS Natural way to model correlated data Model the distribution of phenotypes as y i =!+" g i +G i +# i, where " is the effect of a SNP, G is distributed as multivariate normal with VC-matrix proportional to the relationship matrix Parameters: {!, ", h 2, $ 2 } ML way: apply LR to test significance of " 6

17 Mixed Models for GWAS Natural way to model correlated data Model Problem the distribution (07): of estimating phenotypes as y i =!+" g i +G i +# i, the model for single SNP where " is the effect of a SNP, G is distributed takes about 15 minutes. as multivariate normal with VC-matrix Single GWAS = few years proportional to the relationship matrix Parameters: {!, ", h 2, $ 2 } ML way: apply LR to test significance of " 6

18 Where can we improve? Implementation Algorithm Method (math) 7

19 Two-step / score test The main problem is estimation of h 2 each time we introduce new SNP into the model If we assume that a SNP has small e ect on the trait, then its inclusion into the model should not change the estimate of h 2 much Therefore two-step estimation approach can be used: First, estimate h 2 using MM without SNP: y i = µ + G i + i Use the same estimate ĥ 2 to correct the test of association for every SNP genome-wide 8

20 Fasta way (Chen & Abecasis, AJHG, 07) The obtained estimates are used to construct the variance-covariance matrix for the data, ˆ Score test is constructed accounting for ˆ : T 2 i = (ḡ i T ˆ 1 Ȳ ) 2 ḡ i T ˆ 1ḡ i 9

21 GRAMMAR way (Aulchenko et al., 07; Amin et al., 07; Svishcheva et al., Nat Genet, 12) Define Y = ˆ 1 Ȳ Approximate ḡ i T ˆ 1ḡ i with ḡ i T (ˆ )ḡ i where (ˆ ) is a scalar T 2 i = (ḡ i T ˆ 1 Ȳ ) 2 ḡ i T ˆ 1ḡ i 1 (ˆ ) (ḡ i T Y ) 2 ḡ it ḡ i 10

22 Speed comparison 11

23 Accuracy of Grammar-γ T 2 i = (ḡ i T ˆ 1 Ȳ ) 2 ḡ i T ˆ 1ḡ i 1 (ˆ ) (ḡ i T Y ) 2 ḡ it ḡ i a GRAMMAR-Gamma, LR-GC GRAMMAR-Gamma, LR-GC b Height n = 2,592 y = x y = x FASTA FRI n = 164 y = x y = x BMI n = 2,591 y = 0.999x y = x 0 FASTA avrpphb n = 90 y = x y = x HDL n = 2,585 y = x y = x 0 FASTA avrrpm1 n = 84 y = x y = x 0 Grey dots: FASTA vs LM. Black dots: Grammar- vs FASTA. Upper row: human data ( is almost the same); lower row: A. thaliana highly structured data (less accurate) (Svishcheva et al., Nat Genet, 12) FASTA FASTA FASTA 12

24 Accuracy of Grammar-γ T 2 i = (ḡ i T ˆ 1 Ȳ ) 2 ḡ i T ˆ 1ḡ i 1 (ˆ ) (ḡ i T Y ) 2 ḡ it ḡ i a GRAMMAR-Gamma, LR-GC GRAMMAR-Gamma, LR-GC b Height n = 2,592 y = x y = x FASTA FRI n = 164 y = x y = x BMI n = 2,591 y = 0.999x y = x 0 FASTA avrpphb n = 90 y = x y = x HDL n = 2,585 y = x y = x 0 FASTA avrrpm1 n = 84 y = x y = x 0 Grey dots: FASTA vs LM. Black dots: Grammar- vs FASTA. Upper row: human data ( is almost the same); lower row: A. thaliana highly structured data (less accurate) (Svishcheva et al., Nat Genet, 12) FASTA FASTA FASTA 12

25 Accuracy of Grammar-γ Ti 2 = (ḡ i T ˆ 1 Ȳ ) 2 1 ḡ T i ˆ 1ḡ i (ˆ ) (ḡ i T Y ) 2 ḡ it ḡ i Sub-optimal approximation a GRAMMAR-Gamma, LR-GC GRAMMAR-Gamma, LR-GC Height n = 2,592 y = x y = x Var( m b FASTA FRI n = 164 y = x y = x 0 FASTA when BMI HDL n = 2,591 n = 2,585 y = 0.999x y = x y = x y = x E g easily tested 0 0 FASTA FASTA 0 n i,j= 1 1 gmi E gm ij gmj E gm avrpphb n = 90 y = x y = x 0 n l= 1 FASTA g ml pts i and j define a pair of relatives, 1 is an elem 0 m avrrpm1 n = 84 y = x y = x 0 2 FASTA Grey dots: FASTA vs LM. Black dots: Grammar- vs FASTA. Upper row: human data ( is almost the same); lower row: A. thaliana highly structured data (less accurate) (Svishcheva et al., Nat Genet, 12) ) is large 13

26 More general problem: GWAS for multiple traits Let us step back to 10 (Grammar-! not there yet) 07-10: from 15 minutes for single SNP to 15 minutes for a GWAS 14

27 More general problem: GWAS for multiple traits Let us step back to 10 (Grammar-! not there yet) 07-10: from 15 minutes for single SNP to 15 minutes for a GWAS What if we have 100,000 traits? Back to several years?! 14

28 More general problem: GWAS for multiple traits Let us step back to 10 (Grammar-! not there yet) 07-10: from 15 minutes for single SNP to 15 minutes for a GWAS What if we have 100,000 traits? Back to several years?! Treatment of the problem for arbitrary number of traits, t Using FASTA approach: sequence of GLS problems 14

29 Where can we improve? Method (math) 15

30 Where can we improve? Algorithm Method (math) 15

31 Where can we improve? Implementation Algorithm Method (math) 15

32 Where can we improve? Work in Implementation collaboration with prof. Algorithm Bientinesi and mr. Fabregat- Method (math) Traver, RWTH Aachen 15

33 Algorithms with CLAK CLAK: system for automatic generation of algorithms Twenty algorithms generated Two selected: one for few-trait (<=10), and one for multi-trait (>10) GWAS 16

34 Implementation Effective factorization Grouping and use of multi-threaded BLAS-3 for large matrix by matrix operations Custom thread-based parallelization Double buffering and asynchronous data transfers 17

35 Speed comparison 100 EMMAX GWFGLS FaSTLMM CLAK-Chol EMMAX GWFGLS FaSTLMM CLAK-Chol 68 hours hours 7 hours 25 hours 10 6 hours 1,000 10,000,000,000 Sample size (n) *10 7 Number of SNPs (m) EMMAX FaSTLMM GWFGLS CLAK-Eig 4.58 years 26 months EMMAX: 2789x FaST-LMM: 1352x GWFGLS: 1012x 1 months 14 hours * Number of traits (t) CLAK-Eig: 1x Number of traits (t) 18

36 Large sample EMMAX FaSTLMM GWFGLS CLAK-Eig Number of traits (t) 4.58 years 26 months EMMAX GWFGLS FaSTLMM CLAK-Chol EMMAX GWFGLS FaSTLMM CLAK-Chol * hours 7 hours Number of SNPs (m) 68 hours 6 hours 500 months 1,000 10,000, , hours CLAK-Eig: 1x * Sample size (n) Number of traits (t) 25 hours EMMAX: FaST-LMM: 1352x GWFGLS: 2789x 1012x 19

37 Many SNPs EMMAX GWFGLS FaSTLMM CLAK-Chol 44 hours 7 hours 1,000 10,000,000,000 Sample size (n) EMMAX GWFGLS FaSTLMM 25 hours CLAK-Chol 68 hours EMMAX FaSTLMM GWFGLS CLAK-Eig 500 months * hours CLAK-Eig: 1x * Number of traits (t) 4.58 years 26 months Number of SNPs (m) Number of traits (t) EMMAX: 6 hours FaST-LMM: 1352x GWFGLS: 2789x 1012x

38 Multi-trait GWAS EMMAX GWFGLS FaSTLMM CLAK-Chol 7 hours 1,000 10,000,000,000 Sample size (n) EMMAX FaSTLMM 44 hours GWFGLS CLAK-Eig 25 hours 10 EMMAX GWFGLS FaSTLMM CLAK-Chol *10 7 Number of SNPs (m) 68 hours 4.58 years 6 hours 26 months 1 00 EMMAX: 2789x FaST-LMM: months 1352x GWFGLS: hours 1012x * 10 4 CLAK-Eig: 1x Number of traits (t) Number of traits (t) 21

39 Multi-trait GWAS ,000 10,000,000, EMMAX FaSTLMM GWFGLS CLAK-Eig EMMAX GWFGLS FaSTLMM CLAK-Chol 44 hours 7 hours Sample size (n) * Number of traits (t) 25 hours 4.58 years 26 months 10 EMMAX GWFGLS FaSTLMM CLAK-Chol *10 7 Number of SNPs (m) 68 hours 6 hours months hours Number of traits (t) EMMAX: FaST-LMM: 1352x GWFGLS: CLAK-Eig: 2789x 1012x 1x 22

40 Multi-trait GWAS Running 44 hours on real 25 hours FaST-LMM: data: 1352x 10 6 hours 10 Metabolome 7 hours GWAS: >100,000 traits 1,000 10,000,000, *10 7 Sample size (n) Number of SNPs (m) EMMAX FaSTLMM GWFGLS CLAK-Eig EMMAX GWFGLS FaSTLMM CLAK-Chol * Number of traits (t) 4.58 years 26 months EMMAX GWFGLS FaSTLMM CLAK-Chol 68 hours months hours Number of traits (t) EMMAX: GWFGLS: CLAK-Eig: 2789x 1012x 1x 23

41 Multi-trait GWAS Running 44 hours on real 25 hours FaST-LMM: data: 1352x 10 6 hours 10 Metabolome 7 hours GWAS: >100,000 traits 1,000 10,000,000, *10 7 Sample size (n) Number of SNPs (m) EMMAX FaSTLMM GWFGLS CLAK-Eig EMMAX GWFGLS FaSTLMM CLAK-Chol * Number of traits (t) 4.58 years 26 months EMMAX GWFGLS FaSTLMM CLAK-Chol 68 hours months hours Number of traits (t) EMMAX: Finished in 8 hours GWFGLS: CLAK-Eig: 2789x 1012x 1x 23

42 Conclusions Enormous progress: from 15 minutes for single SNP test to 100(s) of GWAS in 15 minutes (x10,000,000) 24

43 Conclusions Enormous progress: from 15 minutes for single SNP test to 100(s) of GWAS in 15 minutes (x10,000,000) Problem knowledge: in Method-Algorithm- Implementation, every step counts 24

44 Conclusions Enormous progress: from 15 minutes for single SNP test to 100(s) of GWAS in 15 minutes (x10,000,000) Problem knowledge: in Method-Algorithm- Implementation, every step counts Practical way to produce practical methods: agile methodology (short feedback loop) 24

45 The GenABEL project Implementing the agile methodology framework Open source Free (as in freedom ) Collaborative, open, aiming to provide an agile environment 25

46 26

47 Stay tuned! 26

48 Method/algorithmic similarities Animal breeding literature/methods from 19/70s FMM of W. Astle and D. Balding (08) (MixABEL, 10) FaST-LMM of Lippert et al. (12); GEMMA of Zhou & Stephens (12) FASTA-like: EMMAX of Kang et al., (10), P3D of Zhang et al. (12) 27

49 YuriiA consulting 28

50 28

Efficient Bayesian mixed model analysis increases association power in large cohorts

Efficient Bayesian mixed model analysis increases association power in large cohorts Linear regression Existing mixed model methods New method: BOLT-LMM Time O(MM) O(MN 2 ) O MN 1.5 Corrects for confounding? Power Efficient Bayesian mixed model analysis increases association power in large

More information

Package MixABEL. March 18, Index 8. MixABEL package...

Package MixABEL. March 18, Index 8. MixABEL package... Package MixABEL March 18, 2011 Type Package Title mixed models for genetic association analysis Version 0.1-1 Date 2011-02-23 Author Yurii Aulchenko, William Astle, Erik Roos, Marcel Kempenaar Maintainer

More information

Supplementary Information

Supplementary Information Supplementary Information 1 Supplementary Figures (a) Statistical power (p = 2.6 10 8 ) (b) Statistical power (p = 4.0 10 6 ) Supplementary Figure 1: Statistical power comparison between GEMMA (red) and

More information

EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES

EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES Submitted to the Annals of Applied Statistics EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES By Matti Pirinen, Peter Donnelly and Chris C.A.

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

Latent Variable Methods for the Analysis of Genomic Data

Latent Variable Methods for the Analysis of Genomic Data John D. Storey Center for Statistics and Machine Learning & Lewis-Sigler Institute for Integrative Genomics Latent Variable Methods for the Analysis of Genomic Data http://genomine.org/talks/ Data m variables

More information

A Compiler for Linear Algebra Operations

A Compiler for Linear Algebra Operations A Compiler for Linear Algebra Operations Paolo Bientinesi In collaboration with Diego Fabregat AICES, RWTH Aachen pauldj@aices.rwth-aachen.de CScADS Autotuning Workshop 2012 August 13-14, 2012 Snowbird,

More information

Recent advances in statistical methods for DNA-based prediction of complex traits

Recent advances in statistical methods for DNA-based prediction of complex traits Recent advances in statistical methods for DNA-based prediction of complex traits Mintu Nath Biomathematics & Statistics Scotland, Edinburgh 1 Outline Background Population genetics Animal model Methodology

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Knowledge-Based Automatic Generation of Algorithms and Code

Knowledge-Based Automatic Generation of Algorithms and Code Knowledge-Based Automatic Generation of Algorithms and Code Diego Fabregat Traver AICES, RWTH Aachen fabregat@aices.rwth-aachen.de Doctoral Defense Aachen, December 6th, 2013 Diego Fabregat (AICES, RWTH

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

FaST Linear Mixed Models for Genome-Wide Association Studies

FaST Linear Mixed Models for Genome-Wide Association Studies FaST Linear Mixed Models for Genome-Wide Association Studies Christoph Lippert 1-3, Jennifer Listgarten 1,3, Ying Liu 1, Carl M. Kadie 1, Robert I. Davidson 1, and David Heckerman 1,3 1 Microsoft Research

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

MATH ELEMENTARY STATISTICS SPRING 2013 ANSWERS TO SELECTED EVEN PROBLEMS & PRACTICE PROBLEMS, UNIT 1

MATH ELEMENTARY STATISTICS SPRING 2013 ANSWERS TO SELECTED EVEN PROBLEMS & PRACTICE PROBLEMS, UNIT 1 MATH 10043 ELEMENTARY STATISTICS SPRING 2013 ANSWERS TO SELECTED EVEN PROBLEMS & PRACTICE PROBLEMS, UNIT 1 1.3 (2) Qualitative data are values assigning items to non-numeric categories; quantitative data

More information

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Everyday Multithreading

Everyday Multithreading Everyday Multithreading Parallel computing for genomic evaluations in R C. Heuer, D. Hinrichs, G. Thaller Institute of Animal Breeding and Husbandry, Kiel University August 27, 2014 C. Heuer, D. Hinrichs,

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Statistical Methods for Integration of Multiple Omics Data

Statistical Methods for Integration of Multiple Omics Data Statistical Methods for Integration of Multiple Omics Data Hae-Won Uh BMTL, October 2014, Naples October 23, 2014 Hae-Won Uh, BMTL, October 2014, Naples Statistical Methods for Integration of Multiple

More information

Monitoring and data filtering II. Dan Jensen IPH, KU

Monitoring and data filtering II. Dan Jensen IPH, KU Monitoring and data filtering II Dan Jensen IPH, KU Outline Introduction to Dynamic Linear Models (DLM) - Conceptual introduction - Difference between the Classical methods and DLM - A very simple DLM

More information

Variance Component Models for Quantitative Traits. Biostatistics 666

Variance Component Models for Quantitative Traits. Biostatistics 666 Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett

More information

SUPPLEMENTARY TEXT: EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES

SUPPLEMENTARY TEXT: EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES SUPPLEMENTARY TEXT: EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES By Matti Pirinen, Peter Donnelly and Chris C.A. Spencer University of Oxford

More information

Pedigree and genomic evaluation of pigs using a terminal cross model

Pedigree and genomic evaluation of pigs using a terminal cross model 66 th EAAP Annual Meeting Warsaw, Poland Pedigree and genomic evaluation of pigs using a terminal cross model Tusell, L., Gilbert, H., Riquet, J., Mercat, M.J., Legarra, A., Larzul, C. Project funded by:

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Chapter 12 REML and ML Estimation

Chapter 12 REML and ML Estimation Chapter 12 REML and ML Estimation C. R. Henderson 1984 - Guelph 1 Iterative MIVQUE The restricted maximum likelihood estimator (REML) of Patterson and Thompson (1971) can be obtained by iterating on MIVQUE,

More information

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE Eric Zivot Winter 013 1 Wald, LR and LM statistics based on generalized method of moments estimation Let 1 be an iid sample

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

FaST linear mixed models for genome-wide association studies

FaST linear mixed models for genome-wide association studies Nature Methods FaS linear mixed models for genome-wide association studies Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M Kadie, Robert I Davidson & David Heckerman Supplementary Figure Supplementary

More information

Mixed-Models. version 30 October 2011

Mixed-Models. version 30 October 2011 Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

7.1 Sampling Error The Need for Sampling Distributions

7.1 Sampling Error The Need for Sampling Distributions 7.1 Sampling Error The Need for Sampling Distributions Tom Lewis Fall Term 2009 Tom Lewis () 7.1 Sampling Error The Need for Sampling Distributions Fall Term 2009 1 / 5 Outline 1 Tom Lewis () 7.1 Sampling

More information

Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies

Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies Confounding in gene+c associa+on studies q What is it? q What is the effect? q How to detect it?

More information

GWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen.

GWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen. GWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen December 11, 2016 GWAS In Genome Wide Association Studies (GWAS) we try to locate

More information

Tree Building Activity

Tree Building Activity Tree Building Activity Introduction In this activity, you will construct phylogenetic trees using a phenotypic similarity (cartoon microbe pictures) and genotypic similarity (real microbe sequences). For

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Statistics and econometrics

Statistics and econometrics 1 / 36 Slides for the course Statistics and econometrics Part 10: Asymptotic hypothesis testing European University Institute Andrea Ichino September 8, 2014 2 / 36 Outline Why do we need large sample

More information

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables

More information

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

Package KMgene. November 22, 2017

Package KMgene. November 22, 2017 Type Package Package KMgene November 22, 2017 Title Gene-Based Association Analysis for Complex Traits Version 1.2 Author Qi Yan Maintainer Qi Yan Gene based association test between a

More information

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Breeding Values and Inbreeding. Breeding Values and Inbreeding Breeding Values and Inbreeding Genotypic Values For the bi-allelic single locus case, we previously defined the mean genotypic (or equivalently the mean phenotypic values) to be a if genotype is A 2 A

More information

Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy

Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy Received: 20 October 2016 Revised: 15 August 2017 Accepted: 23 August 2017 DOI: 10.1002/sim.7492 RESEARCH ARTICLE Extending the MR-Egger method for multivariable Mendelian randomization to correct for

More information

An Introduction to Multivariate Methods

An Introduction to Multivariate Methods Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate

More information

(Make-Up) Test 1: Multivariable Calculus

(Make-Up) Test 1: Multivariable Calculus (Make-Up) Test 1: Multivariable Calculus Assigned: Fri Mar 6 Due: Wed Mar 22 Ron Buckmire Math 212 Spring 2006 Name: Directions: Read all problems first before answering any of them. There are 6 pages

More information

Deciphering Math Notation. Billy Skorupski Associate Professor, School of Education

Deciphering Math Notation. Billy Skorupski Associate Professor, School of Education Deciphering Math Notation Billy Skorupski Associate Professor, School of Education Agenda General overview of data, variables Greek and Roman characters in math and statistics Parameters vs. Statistics

More information

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation One-week Course on Genetic Analysis and Plant Breeding 21-2 January 213, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation Jiankang Wang, CIMMYT China and CAAS E-mail: jkwang@cgiar.org; wangjiankang@caas.cn

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis

Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Hongzhe Li hongzhe@upenn.edu, http://statgene.med.upenn.edu University of Pennsylvania Perelman School of

More information

2.2 Selection on a Single & Multiple Traits. Stevan J. Arnold Department of Integrative Biology Oregon State University

2.2 Selection on a Single & Multiple Traits. Stevan J. Arnold Department of Integrative Biology Oregon State University 2.2 Selection on a Single & Multiple Traits Stevan J. Arnold Department of Integrative Biology Oregon State University Thesis Selection changes trait distributions. The contrast between distributions before

More information

3. Properties of the relationship matrix

3. Properties of the relationship matrix 3. Properties of the relationship matrix 3.1 Partitioning of the relationship matrix The additive relationship matrix, A, can be written as the product of a lower triangular matrix, T, a diagonal matrix,

More information

Modeling and Analysis of Hybrid Systems

Modeling and Analysis of Hybrid Systems Modeling and Analysis of Hybrid Systems Algorithmic analysis for linear hybrid systems Prof. Dr. Erika Ábrahám Informatik 2 - Theory of Hybrid Systems RWTH Aachen University SS 2015 Ábrahám - Hybrid Systems

More information

Looking at data: relationships

Looking at data: relationships Looking at data: relationships Least-squares regression IPS chapter 2.3 2006 W. H. Freeman and Company Objectives (IPS chapter 2.3) Least-squares regression p p The regression line Making predictions:

More information

DEXSeq paper discussion

DEXSeq paper discussion DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml

More information

Mobilizing genetic resources and optimizing breeding programs DO NOT COPY. J.-F. Rami UMR AGAP

Mobilizing genetic resources and optimizing breeding programs DO NOT COPY. J.-F. Rami UMR AGAP Mobilizing genetic resources and optimizing breeding programs J.-F. Rami UMR AGAP Genetic Diversity Outline characterization of ex situ Genetic Diversity dynamics of in situ diversity diversity and society

More information

The Matrix Algebra of Sample Statistics

The Matrix Algebra of Sample Statistics The Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) The Matrix Algebra of Sample Statistics

More information

Ten years of progress in Identification for Control. Outline

Ten years of progress in Identification for Control. Outline Ten years of progress in Identification for Control Design and Optimization of Restricted Complexity Controllers Grenoble Workshop, 15-16 January, 2003 Michel Gevers CESAME - UCL, Louvain-la-Neuve, Belgium

More information

Math 180B Problem Set 3

Math 180B Problem Set 3 Math 180B Problem Set 3 Problem 1. (Exercise 3.1.2) Solution. By the definition of conditional probabilities we have Pr{X 2 = 1, X 3 = 1 X 1 = 0} = Pr{X 3 = 1 X 2 = 1, X 1 = 0} Pr{X 2 = 1 X 1 = 0} = P

More information

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) V.K. Bhatia I.A.S.R.I., Library Avenue, New Delhi- 11 0012 vkbhatia@iasri.res.in Introduction Variance components are commonly used

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Manual for ProbABEL v0.5.0

Manual for ProbABEL v0.5.0 Manual for ProbABEL v0.5.0 Current Programmers: Lennart Karssen 1, Maarten Kooyman 2, Yurii Aulchenko 1,3 Former Programmers: Maksim Struchalin 1 PolyOmica, Groningen, The Netherlands 2 Erasmus MC, Rotterdam,

More information

Jun Zhang Department of Computer Science University of Kentucky

Jun Zhang Department of Computer Science University of Kentucky Application i of Wavelets in Privacy-preserving Data Mining Jun Zhang Department of Computer Science University of Kentucky Outline Privacy-preserving in Collaborative Data Analysis Advantages of Wavelets

More information

Supplementary File 3: Tutorial for ASReml-R. Tutorial 1 (ASReml-R) - Estimating the heritability of birth weight

Supplementary File 3: Tutorial for ASReml-R. Tutorial 1 (ASReml-R) - Estimating the heritability of birth weight Supplementary File 3: Tutorial for ASReml-R Tutorial 1 (ASReml-R) - Estimating the heritability of birth weight This tutorial will demonstrate how to run a univariate animal model using the software ASReml

More information

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Modeling Sub-Visible Particle Data Product Held at Accelerated Stability Conditions José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Outline Sub-Visible Particle (SbVP) Poisson Negative Binomial

More information

Genetic parameters for female fertility in Nordic dairy cattle

Genetic parameters for female fertility in Nordic dairy cattle Genetic parameters for female fertility in Nordic dairy cattle K.Muuttoranta 1, A-M. Tyrisevä 1, E.A. Mäntysaari 1, J.Pösö 2, G.P. Aamand 3, J-Å. Eriksson 4, U.S. Nielsen 5, and M. Lidauer 1 1 Natural

More information

Research Methodology: Tools

Research Methodology: Tools MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Lecture 9 Multi-Trait Models, Binary and Count Traits

Lecture 9 Multi-Trait Models, Binary and Count Traits Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

The Quantitative TDT

The Quantitative TDT The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus

More information

Quantitative characters - exercises

Quantitative characters - exercises Quantitative characters - exercises 1. a) Calculate the genetic covariance between half sibs, expressed in the ij notation (Cockerham's notation), when up to loci are considered. b) Calculate the genetic

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

GBLUP and G matrices 1

GBLUP and G matrices 1 GBLUP and G matrices 1 GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ To refer breeding values to an average value of 0, we adopt the centered coding for genotypes described

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

A Robust Test for Two-Stage Design in Genome-Wide Association Studies

A Robust Test for Two-Stage Design in Genome-Wide Association Studies Biometrics Supplementary Materials A Robust Test for Two-Stage Design in Genome-Wide Association Studies Minjung Kwak, Jungnam Joo and Gang Zheng Appendix A: Calculations of the thresholds D 1 and D The

More information

Inference using structural equations with latent variables

Inference using structural equations with latent variables This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Distinctive aspects of non-parametric fitting

Distinctive aspects of non-parametric fitting 5. Introduction to nonparametric curve fitting: Loess, kernel regression, reproducing kernel methods, neural networks Distinctive aspects of non-parametric fitting Objectives: investigate patterns free

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

Remark 3.2. The cross product only makes sense in R 3.

Remark 3.2. The cross product only makes sense in R 3. 3. Cross product Definition 3.1. Let v and w be two vectors in R 3. The cross product of v and w, denoted v w, is the vector defined as follows: the length of v w is the area of the parallelogram with

More information

Flexible phenotype simulation with PhenotypeSimulator Hannah Meyer

Flexible phenotype simulation with PhenotypeSimulator Hannah Meyer Flexible phenotype simulation with PhenotypeSimulator Hannah Meyer 2018-03-01 Contents Introduction 1 Work-flow 2 Examples 2 Example 1: Creating a phenotype composed of population structure and observational

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Hierarchical generalized linear models a Lego approach to mixed models

Hierarchical generalized linear models a Lego approach to mixed models Hierarchical generalized linear models a Lego approach to mixed models Lars Rönnegård Högskolan Dalarna Swedish University of Agricultural Sciences Trondheim Seminar Aliations Hierarchical Generalized

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

The concept of breeding value. Gene251/351 Lecture 5

The concept of breeding value. Gene251/351 Lecture 5 The concept of breeding value Gene251/351 Lecture 5 Key terms Estimated breeding value (EB) Heritability Contemporary groups Reading: No prescribed reading from Simm s book. Revision: Quantitative traits

More information

Estimation of the Angular Density in Multivariate Generalized Pareto Models

Estimation of the Angular Density in Multivariate Generalized Pareto Models in Multivariate Generalized Pareto Models René Michel michel@mathematik.uni-wuerzburg.de Institute of Applied Mathematics and Statistics University of Würzburg, Germany 18.08.2005 / EVA 2005 The Multivariate

More information