GLIDE: GPU-based LInear Detection of Epistasis

Size: px
Start display at page:

Download "GLIDE: GPU-based LInear Detection of Epistasis"

Transcription

1 GLIDE: GPU-based LInear Detection of Epistasis Chloé-Agathe Azencott with Tony Kam-Thong, Lawrence Cayton, and Karsten Borgwardt Machine Learning and Computational Biology Research Group Max Planck Institute for Developmental Biology & Max Planck Institute for Intelligent Systems Tübingen, Germany May 16, 2012 C.-A. Azencott GLIDE May 16,

2 GWAS: Genome-Wide Association Studies...ATTACGTACGAT......ATTA GGTACGAT......ATTACGTACGAT......ATTACGTACGAT......ATTACGTACGAT SNPs subjects Which SNPs explain the phenotype? C.-A. Azencott GLIDE May 16,

3 GWAS Variables Phenotype and genotype can be binary, discrete or continuous Phenotype SNPs values Binary Sick / Not sick Homozygous SNPs Discrete Eye color {0, 1, 2} encoding Continuous Height Imputed values {0, 1, 2} allele count encoding Alleles A A A a a a Encoding { 1, 1, 1} dominance encoding Alleles A A A a a a Encoding C.-A. Azencott GLIDE May 16,

4 Missing Heritability and Epistasis Single-locus GWAS fail to explain all the heritability of most complex traits rare variants, undetected SNPs with small effect size... interactions between SNPs [Manolio et al. 2010, Zuk et al. 2012] Known synergetic effects between genes Enhance/suppress cancer mutations [Ashworth et al. 2011] Loss of VHL (tumor supressor) causes cellular senescense, unless Retinoblastoma (another tumor supressor) is also inactivated Working memory related brain activation [Tan et al. 2007] GRM3 adverse effect on prefrontal engagement only in presence of one variant of COMT Map pairs of SNPs to the phenotype C.-A. Azencott GLIDE May 16,

5 State of the Art SNP pairs Computational burden Statistical issues IC 1101 the biggest known galaxy Reduce the search space Two-stage approaches Only consider SNPs from single-locus GWAS [Zhang et al. 2007] relevant pathways underlying PPI [Emily et al. 2009] C.-A. Azencott GLIDE May 16,

6 State of the Art Reduce the search space Space-pruning techniques FastANOVA: branch-and-bound on SNPs [Zhang et al. 2008] TEAM: efficient updates of contigency tables [Zhang et al. 2010] Sampling approaches BEAM [Zhang et al. 2007] MCMC sampling Random forests [Lunetta et al. 2004] Lightbulb [Achlioptas et al. 2011] all limited to binary or discrete phenotypes/genotypes C.-A. Azencott GLIDE May 16,

7 State of the Art Use Graphics Processing Units (GPUs) Build images fast for display Highly parallelizable, simple functions SHEsisEpi [Hu et al., 2010], EPIBLASTER [Kam-Thong et al., 2011], EPIGPUHSIC [Kam-Thong et al., 2011], GBOOST [Young et al., 2011], EpiGPU [Hemani et al., 2011] Drawbacks: Limited to binary or discrete phenotypes/genotypes Neglect main effects Reduced interpretability C.-A. Azencott GLIDE May 16,

8 GLIDE GPU-based linear regression for the detection of epistasis Phenotype = α SNP 1 + β SNP 2 + γ SNP 1 SNP 2 + δ Is γ signficantly different from 0? t-test Both phenotype and genotype can be continuous Main effects are accounted for C.-A. Azencott GLIDE May 16,

9 GPU Implementation m n subject-snp matrix X, Phenotype y R m Each SNP is a column x i R m Each thread looks at one SNP pair (x i, x j) define: X ij = 1 x i x j x i x j solve: X ij α ij y by α ij = ( X ij X ij) 1 X ij y C.-A. Azencott GLIDE May 16,

10 GPU Implementation Threads are grouped in blocks that use a subset of columns of X Each block k, l of size B B looks at the interactions between SNPs from {x kb+1, x kb+2,..., x kb+b } and {x lb+1, x lb+2,..., x lb+b } Phase I compute A k,l = x kb+1... x kb+b x lb+1... x lb+b x 2 kb+1... x2 kb+b x2 lb+1... x2 lb+b 1 y store T k,l = A k,l A k,l in shared memory C.-A. Azencott GLIDE May 16,

11 GPU Implementation Phase II recover X ij X ij as X ij X ij = m x i 1 x j 1 (x i x j) 1 x i 1 x i x i x i x j x i (x i x j) x j 1 x j x i x j x j x j (x i x j) (x i x j) 1 (x i x j) x i (x i x j) x j (x i x j) (x i x j) invert X ij X ij of dimension 4 4 analytically estimate the regression coefficients as (X ij X ij ) 1 X ij y compute the estimated phenotype, residual, t-scores C.-A. Azencott GLIDE May 16,

12 Statistical Significance Multiple Hypothesis Testing correction Bonferroni correction overly conservative (linked markers) Permutation testing computationally intractable [Becker et al. 2011] MC-simulations correction factor: 0.4 m m = n(n 1)/2 SNP pairs C.-A. Azencott GLIDE May 16,

13 Runtime Performance Synthetic data 1, 000 subjects, 5, 000 SNPs GPUs: NVIDIA GTX 580 ( $450) GLIDE Speed Performance Speed vs. Number of Subjects Method Runtime GLIDE speedup PLINK s FastEpistasis / node s 280 GLIDE / GPU 5 s GLIDE FastEpistasis PLINK 1000 '000 Interactions/sec # of Subjects C.-A. Azencott GLIDE May 16,

14 Hippocampus Volume Epistasis Detection I Hippocampus I involved in many cognitive processes (e.g. formation of new memories) I volume reduction Alzheimer s disease, schizophrenia, recurrent depression I volume known to be inheritable I GWAS study: 567 genotyped subjects, about 106 SNPs C.-A. Azencott GLIDE May 16,

15 Hippocampus Volume Epistasis Detection Single-locus GWAS 20 SNPs with significant main effects 14 associated with hippocampal morphology and brain maturation explain 18% of the variance Two-locus GWAS Runtime 3 days on a single GPU 20 pairs with lowest p-values ( ) No significant main effects 8 independent pairs, explain 40% of the variance Together explain 50% of the variance Low MAF potentially driven by small number of outliers C.-A. Azencott GLIDE May 16,

16 Hippocampus Volume Epistasis Detection SNPs close to genes linked to: ICOS, CTLA4: neurogenesis and neural plasticity Q-Q Plot ZEB2: hippocampal development regulation ZPLD1: cerebral malformations TRPM6: cation channels, expressed in the brain PCDH8: cell adhesion in the central nervous system C.-A. Azencott GLIDE May 16,

17 Future Work additive & dominance effects y = α 1x 1 + β 1z 1 + α 2x 2 + β 2z 2+ γ aax 1x 2 + γ ad x 1z 2 + γ da z 1x 2 + γ dd z 1z 2 + δ Assessment of significance: permutation tests on GPU Population structure correction for epistasis on GPU Eigenstrat approach: add large eigenvectors of the kinship matrix as covariates Linear Mixed Models (EMMA, FaST-LMM) y N (Xβ; σ 2 gk + σ 2 ei) Hippocampal volume: per cytoarchitectonic subregion C.-A. Azencott GLIDE May 16,

18 Other Ongoing Projects Learning from data with missing information Long-range SNP correlation & genetic interactions Disease gene prediction from gene networks Analysis of clinical data (mood disorders and immunology) C.-A. Azencott GLIDE May 16,

19 Acknowledgements Karsten Borgwardt, Bernhard Schölkopf, Betram Müller-Myhsok, Detlef Weigel Tony Kam-Thong Lawrence Cayton Philipp Sämann Benno Pütz, André Altmann Theofanis Karaletsos Alexander von Humboldt Stiftung T. Kam-Thong, C.-A. Azencott, L. Cayton, B. Pütz, A. Altmann, P. Sämann, B. Schölkopf, B. Müller-Myhsok and K. Borgwardt. GLIDE: GPU-based linear regression for detection of epistasis, submitted GLIDE is available at C.-A. Azencott GLIDE May 16,

Methods for multi-locus genome-wide association studies Chloé-Agathe Azencott

Methods for multi-locus genome-wide association studies Chloé-Agathe Azencott Methods for multi-locus genome-wide association studies Chloé-Agathe Azencott Center for Computational Biology (CBIO) Mines ParisTech Institut Curie INSERM U900 PSL Research University, Paris, France April

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

Network Mining for Personalized Medicine

Network Mining for Personalized Medicine Department Biosystems Network Mining for Personalized Medicine Karsten Borgwardt ETH Zürich, Department Biosystems The Hague, September 3, 2016 Mapping Phenotypes to the Genome Disease Genotype Individual

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Computational Biology

Computational Biology Computational Biology Karsten Borgwardt Machine Learning and Computational Biology Research Group Max Planck Institute for Intelligent Systems & Max Planck Institute for Developmental Biology, Tübingen

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Significant Pattern Mining

Significant Pattern Mining Department Biosystems Significant Pattern Mining Karsten Borgwardt ETH Zürich Uni Basel, April 21, 2016 Biomarker Discovery Department Biosystems Karsten Borgwardt Seminar Basel April 21, 2016 2 / 41 Department

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

Evolution of phenotypic traits

Evolution of phenotypic traits Quantitative genetics Evolution of phenotypic traits Very few phenotypic traits are controlled by one locus, as in our previous discussion of genetics and evolution Quantitative genetics considers characters

More information

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability

More information

FaST linear mixed models for genome-wide association studies

FaST linear mixed models for genome-wide association studies Nature Methods FaS linear mixed models for genome-wide association studies Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M Kadie, Robert I Davidson & David Heckerman Supplementary Figure Supplementary

More information

Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study

Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study Xiang Zhang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Partitioning Genetic Variance

Partitioning Genetic Variance PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

Solutions to Problem Set 4

Solutions to Problem Set 4 Question 1 Solutions to 7.014 Problem Set 4 Because you have not read much scientific literature, you decide to study the genetics of garden peas. You have two pure breeding pea strains. One that is tall

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs. Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.

More information

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Breeding Values and Inbreeding. Breeding Values and Inbreeding Breeding Values and Inbreeding Genotypic Values For the bi-allelic single locus case, we previously defined the mean genotypic (or equivalently the mean phenotypic values) to be a if genotype is A 2 A

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Variance Component Models for Quantitative Traits. Biostatistics 666

Variance Component Models for Quantitative Traits. Biostatistics 666 Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

Lecture 11: Multiple trait models for QTL analysis

Lecture 11: Multiple trait models for QTL analysis Lecture 11: Multiple trait models for QTL analysis Julius van der Werf Multiple trait mapping of QTL...99 Increased power of QTL detection...99 Testing for linked QTL vs pleiotropic QTL...100 Multiple

More information

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. please read pages 38-47; 49-55;57-63. Slide 1 of Chapter 2 1 Extension sot Mendelian Behavior of Genes Single gene inheritance

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett

More information

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung

More information

Big Idea 3: Living systems store, retrieve, transmit, and respond to information essential to life processes.

Big Idea 3: Living systems store, retrieve, transmit, and respond to information essential to life processes. Big Idea 3: Living systems store, retrieve, transmit, and respond to information essential to life processes. Enduring understanding 3.A: Heritable information provides for continuity of life. Essential

More information

EasyChair Preprint. 1,000x Faster than PLINK: Genome-Wide Epistasis Detection with Logistic Regression Using Combined FPGA and GPU Accelerators

EasyChair Preprint. 1,000x Faster than PLINK: Genome-Wide Epistasis Detection with Logistic Regression Using Combined FPGA and GPU Accelerators EasyChair Preprint 60 1,000x Faster than PLINK: Genome-Wide Epistasis Detection with Logistic Regression Using Combined FPGA and GPU Accelerators Lars Wienbrandt, Jan Christian Kässens, Matthias Hübenthal

More information

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular

More information

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

Régression en grande dimension et épistasie par blocs pour les études d association

Régression en grande dimension et épistasie par blocs pour les études d association Régression en grande dimension et épistasie par blocs pour les études d association V. Stanislas, C. Dalmasso, C. Ambroise Laboratoire de Mathématiques et Modélisation d Évry "Statistique et Génome" 1

More information

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Lecture 6: Introduction to Quantitative genetics Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011 Quantitative Genetics The analysis of traits whose variation is determined by both a

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Causal Discovery by Computer

Causal Discovery by Computer Causal Discovery by Computer Clark Glymour Carnegie Mellon University 1 Outline 1. A century of mistakes about causation and discovery: 1. Fisher 2. Yule 3. Spearman/Thurstone 2. Search for causes is statistical

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for

More information

Computational Biology From The Perspective Of A Physical Scientist

Computational Biology From The Perspective Of A Physical Scientist Computational Biology From The Perspective Of A Physical Scientist Dr. Arthur Dong PP1@TUM 26 November 2013 Bioinformatics Education Curriculum Math, Physics, Computer Science (Statistics and Programming)

More information

Genetics Studies of Multivariate Traits

Genetics Studies of Multivariate Traits Genetics Studies of Multivariate Traits Heping Zhang Department of Epidemiology and Public Health Yale University School of Medicine Presented at Southern Regional Council on Statistics Summer Research

More information

Human Adaptation - ad aptos: good fit between trait and environment

Human Adaptation - ad aptos: good fit between trait and environment Human Adaptation - ad aptos: good fit between trait and environment -produced by natural selection -better than alternatives in immediate circumstances How are health and disease related to human adaptations?

More information

Principles of QTL Mapping. M.Imtiaz

Principles of QTL Mapping. M.Imtiaz Principles of QTL Mapping M.Imtiaz Introduction Definitions of terminology Reasons for QTL mapping Principles of QTL mapping Requirements For QTL Mapping Demonstration with experimental data Merit of QTL

More information

SNP Association Studies with Case-Parent Trios

SNP Association Studies with Case-Parent Trios SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature

More information

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

QTL Model Search. Brian S. Yandell, UW-Madison January 2017 QTL Model Search Brian S. Yandell, UW-Madison January 2017 evolution of QTL models original ideas focused on rare & costly markers models & methods refined as technology advanced single marker regression

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Latent Variable Methods for the Analysis of Genomic Data

Latent Variable Methods for the Analysis of Genomic Data John D. Storey Center for Statistics and Machine Learning & Lewis-Sigler Institute for Integrative Genomics Latent Variable Methods for the Analysis of Genomic Data http://genomine.org/talks/ Data m variables

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Power and sample size calculations for designing rare variant sequencing association studies.

Power and sample size calculations for designing rare variant sequencing association studies. Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

Evolutionary Computation

Evolutionary Computation Evolutionary Computation - Computational procedures patterned after biological evolution. - Search procedure that probabilistically applies search operators to set of points in the search space. - Lamarck

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

SNP-SNP Interactions in Case-Parent Trios

SNP-SNP Interactions in Case-Parent Trios Detection of SNP-SNP Interactions in Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 2, 2009 Karyotypes http://ghr.nlm.nih.gov/ Single Nucleotide Polymphisms

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Multidimensional heritability analysis of neuroanatomical shape. Jingwei Li

Multidimensional heritability analysis of neuroanatomical shape. Jingwei Li Multidimensional heritability analysis of neuroanatomical shape Jingwei Li Brain Imaging Genetics Genetic Variation Behavior Cognition Neuroanatomy Brain Imaging Genetics Genetic Variation Neuroanatomy

More information

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru

More information

ARTICLE MASTOR: Mixed-Model Association Mapping of Quantitative Traits in Samples with Related Individuals

ARTICLE MASTOR: Mixed-Model Association Mapping of Quantitative Traits in Samples with Related Individuals ARTICLE MASTOR: Mixed-Model Association Mapping of Quantitative Traits in Samples with Related Individuals Johanna Jakobsdottir 1,3 and Mary Sara McPeek 1,2, * Genetic association studies often sample

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs The Role of Network Science in Biology and Medicine Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs Network Analysis Working Group 09.28.2017 Network-Enabled Wisdom (NEW) empirically

More information

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017 Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 46 Lecture Overview 1. Variance Component

More information

Designer Genes C Test

Designer Genes C Test Northern Regional: January 19 th, 2019 Designer Genes C Test Name(s): Team Name: School Name: Team Number: Rank: Score: Directions: You will have 50 minutes to complete the test. You may not write on the

More information

Evolution and Epigenetics. Seminar: Social, Cognitive and Affective Neuroscience Speaker: Wolf-R. Brockhaus

Evolution and Epigenetics. Seminar: Social, Cognitive and Affective Neuroscience Speaker: Wolf-R. Brockhaus Evolution and Epigenetics Seminar: Social, Cognitive and Affective Neuroscience Speaker: Wolf-R. Brockhaus 1. History of evolutionary theory The history of evolutionary theory ~ 1800: Lamarck 1859: Darwin's

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

AP Biology Essential Knowledge Cards BIG IDEA 1

AP Biology Essential Knowledge Cards BIG IDEA 1 AP Biology Essential Knowledge Cards BIG IDEA 1 Essential knowledge 1.A.1: Natural selection is a major mechanism of evolution. Essential knowledge 1.A.4: Biological evolution is supported by scientific

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

A FAST, ACCURATE TWO-STEP LINEAR MIXED MODEL FOR GENETIC ANALYSIS APPLIED TO REPEAT MRI MEASUREMENTS

A FAST, ACCURATE TWO-STEP LINEAR MIXED MODEL FOR GENETIC ANALYSIS APPLIED TO REPEAT MRI MEASUREMENTS A FAST, ACCURATE TWO-STEP LINEAR MIXED MODEL FOR GENETIC ANALYSIS APPLIED TO REPEAT MRI MEASUREMENTS Qifan Yang 1,4, Gennady V. Roshchupkin 2, Wiro J. Niessen 2, Sarah E. Medland 3, Alyssa H. Zhu 1, Paul

More information

Heredity and Genetics WKSH

Heredity and Genetics WKSH Chapter 6, Section 3 Heredity and Genetics WKSH KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. Vocabulary trait purebred law of segregation genetics cross MAIN IDEA:

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

G E INTERACTION USING JMP: AN OVERVIEW

G E INTERACTION USING JMP: AN OVERVIEW G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural

More information

Allele interactions: Terms used to specify interactions between alleles of the same gene: Dominant/recessive incompletely dominant codominant

Allele interactions: Terms used to specify interactions between alleles of the same gene: Dominant/recessive incompletely dominant codominant Biol 321 Feb 3, 2010 Allele interactions: Terms used to specify interactions between alleles of the same gene: Dominant/recessive incompletely dominant codominant Gene interactions: the collaborative efforts

More information

Package LBLGXE. R topics documented: July 20, Type Package

Package LBLGXE. R topics documented: July 20, Type Package Type Package Package LBLGXE July 20, 2015 Title Bayesian Lasso for detecting Rare (or Common) Haplotype Association and their interactions with Environmental Covariates Version 1.2 Date 2015-07-09 Author

More information

TEST SUMMARY AND FRAMEWORK TEST SUMMARY

TEST SUMMARY AND FRAMEWORK TEST SUMMARY Washington Educator Skills Tests Endorsements (WEST E) TEST SUMMARY AND FRAMEWORK TEST SUMMARY BIOLOGY Copyright 2014 by the Washington Professional Educator Standards Board 1 Washington Educator Skills

More information

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate Natural Selection Population Dynamics Humans, Sickle-cell Disease, and Malaria How does a population of humans become resistant to malaria? Overproduction Environmental pressure/competition Pre-existing

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

Introduction to Quantitative Genetics. Introduction to Quantitative Genetics

Introduction to Quantitative Genetics. Introduction to Quantitative Genetics Introduction to Quantitative Genetics Historical Background Quantitative genetics is the study of continuous or quantitative traits and their underlying mechanisms. The main principals of quantitative

More information

Chapter 17: Population Genetics and Speciation

Chapter 17: Population Genetics and Speciation Chapter 17: Population Genetics and Speciation Section 1: Genetic Variation Population Genetics: Normal Distribution: a line graph showing the general trends in a set of data of which most values are near

More information

REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR

REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR Grade Requirement: All courses required for the Biochemistry major (CH, MATH, PHYS, BI courses) must be graded and passed with a grade of C- or better. Core Chemistry

More information

4/19/10 More complications to Mendel

4/19/10 More complications to Mendel 4/19/10 More complications to Mendel Complications to the relationship between genotype to phenotype Commentary written in response to the release of the first draft of the human genome sequence From Science

More information