GWAS with mixed models
|
|
- Alexander Green
- 6 years ago
- Views:
Transcription
1 GWAS with mixed models (the trip from 10 0 to 10 8 ) Yurii Aulchenko yurii [dot] Aulchenko [at] gmail [dot] com YuriiA consulting 1
2 Outline Methodology development Speeding things up Simplifying the math From math to software Conclusions & remarks 2
3 Methodology development Method (math) 3
4 Methodology development Data Method (math) 3
5 Methodology development Method (math) 4
6 Methodology development Algorithm Method (math) 4
7 Methodology development Implementation Algorithm Method (math) 4
8 Methodology development Data Implementation Algorithm Method (math) 4
9 Methodology development Fine! Data Implementation Algorithm Method (math) 4
10 Methodology development Fine! Data Wrong Implementation Algorithm Method (math) 4
11 Methodology development Fine! Data Wrong Implementation Algorithm Method (math) 4
12 Methodology development Fine! Data Wrong Implementation Algorithm Method (math) 4
13 Methodology development Fine! Data Wrong Too Implementation slow Algorithm Method (math) 4
14 Methodology development Fine! Data Wrong Too Implementation slow Algorithm Method (math) 4
15 Mixed Models for GWAS Natural way to model correlated data Model the distribution of phenotypes as y i =!+" g i +G i +# i, where " is the effect of a SNP, G is distributed as multivariate normal with VC-matrix proportional to the relationship matrix Parameters: {!, ", h 2, $ 2 } ML way: apply LR to test significance of " 5
16 Mixed Models for GWAS Natural way to model correlated data Model the distribution of phenotypes as y i =!+" g i +G i +# i, where " is the effect of a SNP, G is distributed as multivariate normal with VC-matrix proportional to the relationship matrix Parameters: {!, ", h 2, $ 2 } ML way: apply LR to test significance of " 6
17 Mixed Models for GWAS Natural way to model correlated data Model Problem the distribution (07): of estimating phenotypes as y i =!+" g i +G i +# i, the model for single SNP where " is the effect of a SNP, G is distributed takes about 15 minutes. as multivariate normal with VC-matrix Single GWAS = few years proportional to the relationship matrix Parameters: {!, ", h 2, $ 2 } ML way: apply LR to test significance of " 6
18 Where can we improve? Implementation Algorithm Method (math) 7
19 Two-step / score test The main problem is estimation of h 2 each time we introduce new SNP into the model If we assume that a SNP has small e ect on the trait, then its inclusion into the model should not change the estimate of h 2 much Therefore two-step estimation approach can be used: First, estimate h 2 using MM without SNP: y i = µ + G i + i Use the same estimate ĥ 2 to correct the test of association for every SNP genome-wide 8
20 Fasta way (Chen & Abecasis, AJHG, 07) The obtained estimates are used to construct the variance-covariance matrix for the data, ˆ Score test is constructed accounting for ˆ : T 2 i = (ḡ i T ˆ 1 Ȳ ) 2 ḡ i T ˆ 1ḡ i 9
21 GRAMMAR way (Aulchenko et al., 07; Amin et al., 07; Svishcheva et al., Nat Genet, 12) Define Y = ˆ 1 Ȳ Approximate ḡ i T ˆ 1ḡ i with ḡ i T (ˆ )ḡ i where (ˆ ) is a scalar T 2 i = (ḡ i T ˆ 1 Ȳ ) 2 ḡ i T ˆ 1ḡ i 1 (ˆ ) (ḡ i T Y ) 2 ḡ it ḡ i 10
22 Speed comparison 11
23 Accuracy of Grammar-γ T 2 i = (ḡ i T ˆ 1 Ȳ ) 2 ḡ i T ˆ 1ḡ i 1 (ˆ ) (ḡ i T Y ) 2 ḡ it ḡ i a GRAMMAR-Gamma, LR-GC GRAMMAR-Gamma, LR-GC b Height n = 2,592 y = x y = x FASTA FRI n = 164 y = x y = x BMI n = 2,591 y = 0.999x y = x 0 FASTA avrpphb n = 90 y = x y = x HDL n = 2,585 y = x y = x 0 FASTA avrrpm1 n = 84 y = x y = x 0 Grey dots: FASTA vs LM. Black dots: Grammar- vs FASTA. Upper row: human data ( is almost the same); lower row: A. thaliana highly structured data (less accurate) (Svishcheva et al., Nat Genet, 12) FASTA FASTA FASTA 12
24 Accuracy of Grammar-γ T 2 i = (ḡ i T ˆ 1 Ȳ ) 2 ḡ i T ˆ 1ḡ i 1 (ˆ ) (ḡ i T Y ) 2 ḡ it ḡ i a GRAMMAR-Gamma, LR-GC GRAMMAR-Gamma, LR-GC b Height n = 2,592 y = x y = x FASTA FRI n = 164 y = x y = x BMI n = 2,591 y = 0.999x y = x 0 FASTA avrpphb n = 90 y = x y = x HDL n = 2,585 y = x y = x 0 FASTA avrrpm1 n = 84 y = x y = x 0 Grey dots: FASTA vs LM. Black dots: Grammar- vs FASTA. Upper row: human data ( is almost the same); lower row: A. thaliana highly structured data (less accurate) (Svishcheva et al., Nat Genet, 12) FASTA FASTA FASTA 12
25 Accuracy of Grammar-γ Ti 2 = (ḡ i T ˆ 1 Ȳ ) 2 1 ḡ T i ˆ 1ḡ i (ˆ ) (ḡ i T Y ) 2 ḡ it ḡ i Sub-optimal approximation a GRAMMAR-Gamma, LR-GC GRAMMAR-Gamma, LR-GC Height n = 2,592 y = x y = x Var( m b FASTA FRI n = 164 y = x y = x 0 FASTA when BMI HDL n = 2,591 n = 2,585 y = 0.999x y = x y = x y = x E g easily tested 0 0 FASTA FASTA 0 n i,j= 1 1 gmi E gm ij gmj E gm avrpphb n = 90 y = x y = x 0 n l= 1 FASTA g ml pts i and j define a pair of relatives, 1 is an elem 0 m avrrpm1 n = 84 y = x y = x 0 2 FASTA Grey dots: FASTA vs LM. Black dots: Grammar- vs FASTA. Upper row: human data ( is almost the same); lower row: A. thaliana highly structured data (less accurate) (Svishcheva et al., Nat Genet, 12) ) is large 13
26 More general problem: GWAS for multiple traits Let us step back to 10 (Grammar-! not there yet) 07-10: from 15 minutes for single SNP to 15 minutes for a GWAS 14
27 More general problem: GWAS for multiple traits Let us step back to 10 (Grammar-! not there yet) 07-10: from 15 minutes for single SNP to 15 minutes for a GWAS What if we have 100,000 traits? Back to several years?! 14
28 More general problem: GWAS for multiple traits Let us step back to 10 (Grammar-! not there yet) 07-10: from 15 minutes for single SNP to 15 minutes for a GWAS What if we have 100,000 traits? Back to several years?! Treatment of the problem for arbitrary number of traits, t Using FASTA approach: sequence of GLS problems 14
29 Where can we improve? Method (math) 15
30 Where can we improve? Algorithm Method (math) 15
31 Where can we improve? Implementation Algorithm Method (math) 15
32 Where can we improve? Work in Implementation collaboration with prof. Algorithm Bientinesi and mr. Fabregat- Method (math) Traver, RWTH Aachen 15
33 Algorithms with CLAK CLAK: system for automatic generation of algorithms Twenty algorithms generated Two selected: one for few-trait (<=10), and one for multi-trait (>10) GWAS 16
34 Implementation Effective factorization Grouping and use of multi-threaded BLAS-3 for large matrix by matrix operations Custom thread-based parallelization Double buffering and asynchronous data transfers 17
35 Speed comparison 100 EMMAX GWFGLS FaSTLMM CLAK-Chol EMMAX GWFGLS FaSTLMM CLAK-Chol 68 hours hours 7 hours 25 hours 10 6 hours 1,000 10,000,000,000 Sample size (n) *10 7 Number of SNPs (m) EMMAX FaSTLMM GWFGLS CLAK-Eig 4.58 years 26 months EMMAX: 2789x FaST-LMM: 1352x GWFGLS: 1012x 1 months 14 hours * Number of traits (t) CLAK-Eig: 1x Number of traits (t) 18
36 Large sample EMMAX FaSTLMM GWFGLS CLAK-Eig Number of traits (t) 4.58 years 26 months EMMAX GWFGLS FaSTLMM CLAK-Chol EMMAX GWFGLS FaSTLMM CLAK-Chol * hours 7 hours Number of SNPs (m) 68 hours 6 hours 500 months 1,000 10,000, , hours CLAK-Eig: 1x * Sample size (n) Number of traits (t) 25 hours EMMAX: FaST-LMM: 1352x GWFGLS: 2789x 1012x 19
37 Many SNPs EMMAX GWFGLS FaSTLMM CLAK-Chol 44 hours 7 hours 1,000 10,000,000,000 Sample size (n) EMMAX GWFGLS FaSTLMM 25 hours CLAK-Chol 68 hours EMMAX FaSTLMM GWFGLS CLAK-Eig 500 months * hours CLAK-Eig: 1x * Number of traits (t) 4.58 years 26 months Number of SNPs (m) Number of traits (t) EMMAX: 6 hours FaST-LMM: 1352x GWFGLS: 2789x 1012x
38 Multi-trait GWAS EMMAX GWFGLS FaSTLMM CLAK-Chol 7 hours 1,000 10,000,000,000 Sample size (n) EMMAX FaSTLMM 44 hours GWFGLS CLAK-Eig 25 hours 10 EMMAX GWFGLS FaSTLMM CLAK-Chol *10 7 Number of SNPs (m) 68 hours 4.58 years 6 hours 26 months 1 00 EMMAX: 2789x FaST-LMM: months 1352x GWFGLS: hours 1012x * 10 4 CLAK-Eig: 1x Number of traits (t) Number of traits (t) 21
39 Multi-trait GWAS ,000 10,000,000, EMMAX FaSTLMM GWFGLS CLAK-Eig EMMAX GWFGLS FaSTLMM CLAK-Chol 44 hours 7 hours Sample size (n) * Number of traits (t) 25 hours 4.58 years 26 months 10 EMMAX GWFGLS FaSTLMM CLAK-Chol *10 7 Number of SNPs (m) 68 hours 6 hours months hours Number of traits (t) EMMAX: FaST-LMM: 1352x GWFGLS: CLAK-Eig: 2789x 1012x 1x 22
40 Multi-trait GWAS Running 44 hours on real 25 hours FaST-LMM: data: 1352x 10 6 hours 10 Metabolome 7 hours GWAS: >100,000 traits 1,000 10,000,000, *10 7 Sample size (n) Number of SNPs (m) EMMAX FaSTLMM GWFGLS CLAK-Eig EMMAX GWFGLS FaSTLMM CLAK-Chol * Number of traits (t) 4.58 years 26 months EMMAX GWFGLS FaSTLMM CLAK-Chol 68 hours months hours Number of traits (t) EMMAX: GWFGLS: CLAK-Eig: 2789x 1012x 1x 23
41 Multi-trait GWAS Running 44 hours on real 25 hours FaST-LMM: data: 1352x 10 6 hours 10 Metabolome 7 hours GWAS: >100,000 traits 1,000 10,000,000, *10 7 Sample size (n) Number of SNPs (m) EMMAX FaSTLMM GWFGLS CLAK-Eig EMMAX GWFGLS FaSTLMM CLAK-Chol * Number of traits (t) 4.58 years 26 months EMMAX GWFGLS FaSTLMM CLAK-Chol 68 hours months hours Number of traits (t) EMMAX: Finished in 8 hours GWFGLS: CLAK-Eig: 2789x 1012x 1x 23
42 Conclusions Enormous progress: from 15 minutes for single SNP test to 100(s) of GWAS in 15 minutes (x10,000,000) 24
43 Conclusions Enormous progress: from 15 minutes for single SNP test to 100(s) of GWAS in 15 minutes (x10,000,000) Problem knowledge: in Method-Algorithm- Implementation, every step counts 24
44 Conclusions Enormous progress: from 15 minutes for single SNP test to 100(s) of GWAS in 15 minutes (x10,000,000) Problem knowledge: in Method-Algorithm- Implementation, every step counts Practical way to produce practical methods: agile methodology (short feedback loop) 24
45 The GenABEL project Implementing the agile methodology framework Open source Free (as in freedom ) Collaborative, open, aiming to provide an agile environment 25
46 26
47 Stay tuned! 26
48 Method/algorithmic similarities Animal breeding literature/methods from 19/70s FMM of W. Astle and D. Balding (08) (MixABEL, 10) FaST-LMM of Lippert et al. (12); GEMMA of Zhou & Stephens (12) FASTA-like: EMMAX of Kang et al., (10), P3D of Zhang et al. (12) 27
49 YuriiA consulting 28
50 28
Efficient Bayesian mixed model analysis increases association power in large cohorts
Linear regression Existing mixed model methods New method: BOLT-LMM Time O(MM) O(MN 2 ) O MN 1.5 Corrects for confounding? Power Efficient Bayesian mixed model analysis increases association power in large
More informationPackage MixABEL. March 18, Index 8. MixABEL package...
Package MixABEL March 18, 2011 Type Package Title mixed models for genetic association analysis Version 0.1-1 Date 2011-02-23 Author Yurii Aulchenko, William Astle, Erik Roos, Marcel Kempenaar Maintainer
More informationSupplementary Information
Supplementary Information 1 Supplementary Figures (a) Statistical power (p = 2.6 10 8 ) (b) Statistical power (p = 4.0 10 6 ) Supplementary Figure 1: Statistical power comparison between GEMMA (red) and
More informationEFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES
Submitted to the Annals of Applied Statistics EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES By Matti Pirinen, Peter Donnelly and Chris C.A.
More informationHeritability estimation in modern genetics and connections to some new results for quadratic forms in statistics
Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),
More informationLatent Variable Methods for the Analysis of Genomic Data
John D. Storey Center for Statistics and Machine Learning & Lewis-Sigler Institute for Integrative Genomics Latent Variable Methods for the Analysis of Genomic Data http://genomine.org/talks/ Data m variables
More informationA Compiler for Linear Algebra Operations
A Compiler for Linear Algebra Operations Paolo Bientinesi In collaboration with Diego Fabregat AICES, RWTH Aachen pauldj@aices.rwth-aachen.de CScADS Autotuning Workshop 2012 August 13-14, 2012 Snowbird,
More informationRecent advances in statistical methods for DNA-based prediction of complex traits
Recent advances in statistical methods for DNA-based prediction of complex traits Mintu Nath Biomathematics & Statistics Scotland, Edinburgh 1 Outline Background Population genetics Animal model Methodology
More informationMethods for Cryptic Structure. Methods for Cryptic Structure
Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More information. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)
Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,
More informationKnowledge-Based Automatic Generation of Algorithms and Code
Knowledge-Based Automatic Generation of Algorithms and Code Diego Fabregat Traver AICES, RWTH Aachen fabregat@aices.rwth-aachen.de Doctoral Defense Aachen, December 6th, 2013 Diego Fabregat (AICES, RWTH
More informationProportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power
Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion
More informationCase-Control Association Testing. Case-Control Association Testing
Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More informationFaST Linear Mixed Models for Genome-Wide Association Studies
FaST Linear Mixed Models for Genome-Wide Association Studies Christoph Lippert 1-3, Jennifer Listgarten 1,3, Ying Liu 1, Carl M. Kadie 1, Robert I. Davidson 1, and David Heckerman 1,3 1 Microsoft Research
More informationBayesian construction of perceptrons to predict phenotypes from 584K SNP data.
Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic
More informationMATH ELEMENTARY STATISTICS SPRING 2013 ANSWERS TO SELECTED EVEN PROBLEMS & PRACTICE PROBLEMS, UNIT 1
MATH 10043 ELEMENTARY STATISTICS SPRING 2013 ANSWERS TO SELECTED EVEN PROBLEMS & PRACTICE PROBLEMS, UNIT 1 1.3 (2) Qualitative data are values assigning items to non-numeric categories; quantitative data
More informationA General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations
A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood
More informationStatistical issues in QTL mapping in mice
Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping
More informationEveryday Multithreading
Everyday Multithreading Parallel computing for genomic evaluations in R C. Heuer, D. Hinrichs, G. Thaller Institute of Animal Breeding and Husbandry, Kiel University August 27, 2014 C. Heuer, D. Hinrichs,
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationStatistical Methods for Integration of Multiple Omics Data
Statistical Methods for Integration of Multiple Omics Data Hae-Won Uh BMTL, October 2014, Naples October 23, 2014 Hae-Won Uh, BMTL, October 2014, Naples Statistical Methods for Integration of Multiple
More informationMonitoring and data filtering II. Dan Jensen IPH, KU
Monitoring and data filtering II Dan Jensen IPH, KU Outline Introduction to Dynamic Linear Models (DLM) - Conceptual introduction - Difference between the Classical methods and DLM - A very simple DLM
More informationVariance Component Models for Quantitative Traits. Biostatistics 666
Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett
More informationSUPPLEMENTARY TEXT: EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES
SUPPLEMENTARY TEXT: EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES By Matti Pirinen, Peter Donnelly and Chris C.A. Spencer University of Oxford
More informationPedigree and genomic evaluation of pigs using a terminal cross model
66 th EAAP Annual Meeting Warsaw, Poland Pedigree and genomic evaluation of pigs using a terminal cross model Tusell, L., Gilbert, H., Riquet, J., Mercat, M.J., Legarra, A., Larzul, C. Project funded by:
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationChapter 12 REML and ML Estimation
Chapter 12 REML and ML Estimation C. R. Henderson 1984 - Guelph 1 Iterative MIVQUE The restricted maximum likelihood estimator (REML) of Patterson and Thompson (1971) can be obtained by iterating on MIVQUE,
More informationEcon 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE
Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE Eric Zivot Winter 013 1 Wald, LR and LM statistics based on generalized method of moments estimation Let 1 be an iid sample
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationChapter 4: Factor Analysis
Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.
More informationFaST linear mixed models for genome-wide association studies
Nature Methods FaS linear mixed models for genome-wide association studies Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M Kadie, Robert I Davidson & David Heckerman Supplementary Figure Supplementary
More informationMixed-Models. version 30 October 2011
Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector
More informationMIXED MODELS THE GENERAL MIXED MODEL
MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More information7.1 Sampling Error The Need for Sampling Distributions
7.1 Sampling Error The Need for Sampling Distributions Tom Lewis Fall Term 2009 Tom Lewis () 7.1 Sampling Error The Need for Sampling Distributions Fall Term 2009 1 / 5 Outline 1 Tom Lewis () 7.1 Sampling
More informationIntroduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies
Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies Confounding in gene+c associa+on studies q What is it? q What is the effect? q How to detect it?
More informationGWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen.
GWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen December 11, 2016 GWAS In Genome Wide Association Studies (GWAS) we try to locate
More informationTree Building Activity
Tree Building Activity Introduction In this activity, you will construct phylogenetic trees using a phenotypic similarity (cartoon microbe pictures) and genotypic similarity (real microbe sequences). For
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationStatistics and econometrics
1 / 36 Slides for the course Statistics and econometrics Part 10: Asymptotic hypothesis testing European University Institute Andrea Ichino September 8, 2014 2 / 36 Outline Why do we need large sample
More informationSTATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS
STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables
More informationLecture 8 Genomic Selection
Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection
More informationPackage KMgene. November 22, 2017
Type Package Package KMgene November 22, 2017 Title Gene-Based Association Analysis for Complex Traits Version 1.2 Author Qi Yan Maintainer Qi Yan Gene based association test between a
More informationBreeding Values and Inbreeding. Breeding Values and Inbreeding
Breeding Values and Inbreeding Genotypic Values For the bi-allelic single locus case, we previously defined the mean genotypic (or equivalently the mean phenotypic values) to be a if genotype is A 2 A
More informationExtending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy
Received: 20 October 2016 Revised: 15 August 2017 Accepted: 23 August 2017 DOI: 10.1002/sim.7492 RESEARCH ARTICLE Extending the MR-Egger method for multivariable Mendelian randomization to correct for
More informationAn Introduction to Multivariate Methods
Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate
More information(Make-Up) Test 1: Multivariable Calculus
(Make-Up) Test 1: Multivariable Calculus Assigned: Fri Mar 6 Due: Wed Mar 22 Ron Buckmire Math 212 Spring 2006 Name: Directions: Read all problems first before answering any of them. There are 6 pages
More informationDeciphering Math Notation. Billy Skorupski Associate Professor, School of Education
Deciphering Math Notation Billy Skorupski Associate Professor, School of Education Agenda General overview of data, variables Greek and Roman characters in math and statistics Parameters vs. Statistics
More informationOne-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation
One-week Course on Genetic Analysis and Plant Breeding 21-2 January 213, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation Jiankang Wang, CIMMYT China and CAAS E-mail: jkwang@cgiar.org; wangjiankang@caas.cn
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationRobust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis
Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Hongzhe Li hongzhe@upenn.edu, http://statgene.med.upenn.edu University of Pennsylvania Perelman School of
More information2.2 Selection on a Single & Multiple Traits. Stevan J. Arnold Department of Integrative Biology Oregon State University
2.2 Selection on a Single & Multiple Traits Stevan J. Arnold Department of Integrative Biology Oregon State University Thesis Selection changes trait distributions. The contrast between distributions before
More information3. Properties of the relationship matrix
3. Properties of the relationship matrix 3.1 Partitioning of the relationship matrix The additive relationship matrix, A, can be written as the product of a lower triangular matrix, T, a diagonal matrix,
More informationModeling and Analysis of Hybrid Systems
Modeling and Analysis of Hybrid Systems Algorithmic analysis for linear hybrid systems Prof. Dr. Erika Ábrahám Informatik 2 - Theory of Hybrid Systems RWTH Aachen University SS 2015 Ábrahám - Hybrid Systems
More informationLooking at data: relationships
Looking at data: relationships Least-squares regression IPS chapter 2.3 2006 W. H. Freeman and Company Objectives (IPS chapter 2.3) Least-squares regression p p The regression line Making predictions:
More informationDEXSeq paper discussion
DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml
More informationMobilizing genetic resources and optimizing breeding programs DO NOT COPY. J.-F. Rami UMR AGAP
Mobilizing genetic resources and optimizing breeding programs J.-F. Rami UMR AGAP Genetic Diversity Outline characterization of ex situ Genetic Diversity dynamics of in situ diversity diversity and society
More informationThe Matrix Algebra of Sample Statistics
The Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) The Matrix Algebra of Sample Statistics
More informationTen years of progress in Identification for Control. Outline
Ten years of progress in Identification for Control Design and Optimization of Restricted Complexity Controllers Grenoble Workshop, 15-16 January, 2003 Michel Gevers CESAME - UCL, Louvain-la-Neuve, Belgium
More informationMath 180B Problem Set 3
Math 180B Problem Set 3 Problem 1. (Exercise 3.1.2) Solution. By the definition of conditional probabilities we have Pr{X 2 = 1, X 3 = 1 X 1 = 0} = Pr{X 3 = 1 X 2 = 1, X 1 = 0} Pr{X 2 = 1 X 1 = 0} = P
More information3 Comparison with Other Dummy Variable Methods
Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction
More informationVARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)
VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) V.K. Bhatia I.A.S.R.I., Library Avenue, New Delhi- 11 0012 vkbhatia@iasri.res.in Introduction Variance components are commonly used
More informationHybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationManual for ProbABEL v0.5.0
Manual for ProbABEL v0.5.0 Current Programmers: Lennart Karssen 1, Maarten Kooyman 2, Yurii Aulchenko 1,3 Former Programmers: Maksim Struchalin 1 PolyOmica, Groningen, The Netherlands 2 Erasmus MC, Rotterdam,
More informationJun Zhang Department of Computer Science University of Kentucky
Application i of Wavelets in Privacy-preserving Data Mining Jun Zhang Department of Computer Science University of Kentucky Outline Privacy-preserving in Collaborative Data Analysis Advantages of Wavelets
More informationSupplementary File 3: Tutorial for ASReml-R. Tutorial 1 (ASReml-R) - Estimating the heritability of birth weight
Supplementary File 3: Tutorial for ASReml-R Tutorial 1 (ASReml-R) - Estimating the heritability of birth weight This tutorial will demonstrate how to run a univariate animal model using the software ASReml
More informationProduct Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013
Modeling Sub-Visible Particle Data Product Held at Accelerated Stability Conditions José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Outline Sub-Visible Particle (SbVP) Poisson Negative Binomial
More informationGenetic parameters for female fertility in Nordic dairy cattle
Genetic parameters for female fertility in Nordic dairy cattle K.Muuttoranta 1, A-M. Tyrisevä 1, E.A. Mäntysaari 1, J.Pösö 2, G.P. Aamand 3, J-Å. Eriksson 4, U.S. Nielsen 5, and M. Lidauer 1 1 Natural
More informationResearch Methodology: Tools
MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More informationLecture 9 Multi-Trait Models, Binary and Count Traits
Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationThe Quantitative TDT
The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus
More informationQuantitative characters - exercises
Quantitative characters - exercises 1. a) Calculate the genetic covariance between half sibs, expressed in the ij notation (Cockerham's notation), when up to loci are considered. b) Calculate the genetic
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationGBLUP and G matrices 1
GBLUP and G matrices 1 GBLUP from SNP-BLUP We have defined breeding values as sum of SNP effects:! = #$ To refer breeding values to an average value of 0, we adopt the centered coding for genotypes described
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationA Robust Test for Two-Stage Design in Genome-Wide Association Studies
Biometrics Supplementary Materials A Robust Test for Two-Stage Design in Genome-Wide Association Studies Minjung Kwak, Jungnam Joo and Gang Zheng Appendix A: Calculations of the thresholds D 1 and D The
More informationInference using structural equations with latent variables
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationDistinctive aspects of non-parametric fitting
5. Introduction to nonparametric curve fitting: Loess, kernel regression, reproducing kernel methods, neural networks Distinctive aspects of non-parametric fitting Objectives: investigate patterns free
More informationGene mapping in model organisms
Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2
More informationRemark 3.2. The cross product only makes sense in R 3.
3. Cross product Definition 3.1. Let v and w be two vectors in R 3. The cross product of v and w, denoted v w, is the vector defined as follows: the length of v w is the area of the parallelogram with
More informationFlexible phenotype simulation with PhenotypeSimulator Hannah Meyer
Flexible phenotype simulation with PhenotypeSimulator Hannah Meyer 2018-03-01 Contents Introduction 1 Work-flow 2 Examples 2 Example 1: Creating a phenotype composed of population structure and observational
More informationProperties of the least squares estimates
Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationHierarchical generalized linear models a Lego approach to mixed models
Hierarchical generalized linear models a Lego approach to mixed models Lars Rönnegård Högskolan Dalarna Swedish University of Agricultural Sciences Trondheim Seminar Aliations Hierarchical Generalized
More informationMatrix Factorizations
1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular
More informationThe concept of breeding value. Gene251/351 Lecture 5
The concept of breeding value Gene251/351 Lecture 5 Key terms Estimated breeding value (EB) Heritability Contemporary groups Reading: No prescribed reading from Simm s book. Revision: Quantitative traits
More informationEstimation of the Angular Density in Multivariate Generalized Pareto Models
in Multivariate Generalized Pareto Models René Michel michel@mathematik.uni-wuerzburg.de Institute of Applied Mathematics and Statistics University of Würzburg, Germany 18.08.2005 / EVA 2005 The Multivariate
More information