A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction

Size: px
Start display at page:

Download "A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction"

Transcription

1 A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction Sangseob Leem, Hye-Young Jung, Sungyoung Lee and Taesung Park Bioinformatics and Biostatistics lab Seoul National University

2 Contents 1. Introduction 2. Motivation 3. Method 4. Results 5. Conclusion

3 Interaction SNP $ Ø In single locus association study üno effect SNP 1 üno effect SNP A reason of the Missing heritability

4 MDR method Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer, Ritchie M.D. et al. (2001), Am. J. Hum. Genet., 69, SNP 1 SNP 2 Class Calculate casecontrol ratio Identify high/lowrisk Build 2 2 confusion matrix true positive negative positive TP FP predicted negative FN TN Case Control Case Control High risk High risk 12 4 Low risk Low risk

5 Weaknesses of MDR Biological meaning All possible genotype interaction models are really possible in real world? Log-linear model based MDR (Lee et al. 2007) Computation time Exponential increase by increase of interaction order Filtering based approaches Relief, ReliefF, TuRF, SURF Processing MDR GPU, cumdr Binary classification (# of case, # of control): (2, 1) vs (20, 10), (1, 11) vs (10, 20) Next slide

6 Approaches to overcome simple binary classification Model-based MDR (# of case, # of control): (2, 1) vs (20, 10) Calle, M.L., et al. (2008) MB-MDR: model-based multifactor dimensionality reduction for detecting interactions in high-dimensional genomic data. Ternary classification: high, low and no evidence group wba MDR (# of case, # of control): (11, 1) vs (20, 10) Namkung, J., et al. (2009) New evaluation measures for multifactor dimensionality reduction classifiers in gene gene interaction analysis, Bioinformatics, 25, Weighted balanced accuracy MDR

7 Fuzzy set theory Extension of classical set theory Zadeh, L.A. (1965) Fuzzy sets, Information and control, 8, Degrees of membership Rich or poor vs degree of rich 1/10/100/1000 dollars for a day poor rich poor 1 rich

8 Key difference Original MDR Case Control High risk Low risk wba MDR Fuzzy MDR Case Control Case Control High risk Low risk +1* *2.5 High risk +1* *0. 05 Low risk +1* *0. 95

9 Simple example Case Control High risk Low risk Case Control High risk Low risk

10 Criteria (x) of membership degree (μ H,μ L ) The estimate of the odds ratio (OR) θ. i = n i1 n i0 n 31 n 30 n 56 : the number of individuals with the i th multi locus genotypes in the j th disease group n :6 : the total number of individuals in the j th disease group i = 1,, 3?, j = 1 for case and j = 0 for control (# of case, # of control): (2, 1) vs (20, 10) Standardization z = log(or R ) SE SE = $ X YZ $ X 3Z + $ X Y] $ X 3], log θ 5. = log X YZ X Y] X 3Z X 3] = log n 5$ n :$ log n 5b n :b

11 Membership function Original MDR Fuzzy MDR 0 x < t j klm n, μ g x = m o lm n t j x < t q 1 x t q μ s x = 1-μ g x μ g x = 0 x < t j $ $: tu v o tuv n w xuz t j x < t q 1 x t q μ s x = 1-μ g x,

12 Tuning Parameters Notation F y {,y, y }~5qm,y mq ~ q jƒ 80 (2*2*4*5) combinations Membership function y { = l for linear membership function, y { = s for sigmoid membership function Standardization y = 0 for OR, y = 1 for z Weights w 5 = 1 + ln(or) 5, i = 0, 0.5, 1, 2 Threshold values 2, 4, 8, 16 and 32 for OR , , , and for z

13 Fuzzy MDR procedure(1) Consistent case/control ratio In two loci interactions,?c

14 Fuzzy MDR procedure(2) Original MDR Fuzzy MDR Membership degrees depend on parameter values. TP = n 5$ μ g x 5 5 FN = n 5$ μ s x 5 FP = n 5b μ g x TN = n 5b μ s x 5. 5

15 Empirical Studies Experiments of simulation data Objectives To compare power of Fuzzy MDR with original MDR and wba MDR To find optimal parameter values Data Without marginal effects With marginal effects Generation Parameters F y {, y, y }~5qm, y mq ~ q jƒ Linear/sigmoid, with/without SE, four weight values and 5 threshold values Experiments of real data Bipolar disorder (BD) data in Wellcome Trust Case Control Consortium (WTCCC)

16 Data without marginal effects Structure Four sample sizes 200, 400, 800 and 1600 samples 1000 SNPs Two causative SNPs 70 penetrance tables 7 heritability values 2 minor allele frequencies 5 models Example of penetrance table Model1 AA Aa aa BB Bb Bb Downloaded from

17 200 sample results heritability MAF

18 400 sample results heritability MAF

19 800 sample results heritability MAF

20 1600 sample results heritability MAF

21 Data with marginal effects Structure One sample sizes 2000 cases and 2000 controls 1000 SNPs Two causative SNPs 18 penetrance tables 3 models 3 minor allele frequencies 2 linkage disequilibrium values Model 1 AA Aa aa BB 1 1+θ (1+θ) 2 Bb 1+θ (1+θ) 2 (1+θ) 3 bb (1+θ) 2 (1+θ) 3 (1+θ) 4 Model 2 AA Aa aa BB Bb 1 (1+θ) (1+θ) 2 bb 1 (1+θ) 2 (1+θ) 4 Model 3 AA Aa aa BB Bb 1 1+θ 1+θ bb 1 1+θ 1+θ

22 Results data with marginal effects Model LD MAF

23 index rs number MAF Chromosome (position) gene p-value (rank) 1 rs ( ) 9.82E-06 (8) 2 rs ( ) 1.83E-05 (12) 3 rs ( ) DPP E-05 (10) 4 rs ( ) RNPEPL1 5.03E-06 (3) 5 rs ( ) CMTM8 1.45E-05 (11) 6 rs ( ) LAMP3 5.25E-06 (4) 7 rs ( ) SORCS2 1.13E-01 (17) 8 rs ( ) GLTSCR1L, LOC E-06 (2) 9 rs ( ) 5.39E-05 (14) 10 rs ( ) DFNB E-05 (13) 11 rs ( ) CACNA1C 9.72E-04 (15) 12 rs ( ) TSPAN8 7.22E-02 (16) 13 rs ( ) DGKH 6.23E-01 (19) 14 rs ( ) SLC35F4 1.15E-05 (9) 15 rs ( ) TDRD9 7.69E-06 (6) 16 rs ( ) PALB2 1.33E-07 (1) 17 rs ( ) 9.18E-06 (7) 18 rs ( ) MYO5B 4.79E-01 (18) 19 rs ( ) CDC25B 7.47E-06 (5) Real data BD in WTCCC 1868 cases and 2938 controls 19 SNPs are selected by a literature review Two parameter settings 1. F(L,0,0,3) Linear membership, without SE, without weight, threshold OR = 8 2. F(S,1,1,2) Sigmoid membership, with SE, w $ = 1 + ln(or) $, threshold ZOR = 2*1.96

24 Result of BD in WTCCC F(S, 1,1,2) order SNP combination training accuracy testing accuracy CVC , , 6, , 6, 14, , 6, 9, 11, index rs number MAF Chromosome gene p-value (rank) 15 rs rs TDRD9 (Tudor Domain Containing 9) CDC25B (Cell Division Cycle 25B) 7.69E-06 (6) 7.47E-06 (5)

25 Fuzzy MDR vs Original MDR (Interaction model ) Fuzzy MDR Original MDR M11 has been discovered in real world! M M

26 5. Conclusion A novel and powerful Fuzzy MDR for gene-gene interaction analysis Based on fuzzy set theory H and L risk groups are fuzzy sets Original MDR is a special case of Fuzzy MDR More flexible interpretation by the degree of membership of each multi-locus genotype Potential of extension Future work Determining of the optimal tuning parameter values Extensions

27 Thank you.

28 References Ritchie, M.D., et al. (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer, Am J Hum Genet, 69, Velez, D.R., et al. (2007) A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction, Genet Epidemiol, 31, Leem, S., et al. (2014) Fast detection of high-order epistaticinteractions in genome-wide association studies using information theoretic measure, Computational Biology and Chemistry, 50, Burton, P.R., et al. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls, Nature, 447, Li, W. and Reich, J. (2000) A Complete Enumeration and Classification of Two-Locus Disease Models, Human Heredity, 50,

29 Limit of single-locus association studies <Penetrance table> MAF: 0.4 Prevalence: 0.1 SNP_B SNP_A AA (0.36) Aa (0.48) aa (0.16) BB (0.36) Bb (0.48) bb (0.16) P SNP_A Penetrances are the same across genotypes in SNP_A. Penetrances are the same across genotypes in SNP_B. Penetrances are different in genotype combinations of SNP_A and SNP_B P SNP_B

30 Method Original Confusion matrix calculation wtp = 5 q5q n 5$ wfp = 5 q5q n 5b wfn = 5 j } n 5$ wtn = 5 j } n 5b Weighted wtp = 5 q5q w i n 5$ wfp = 5 q5q w i n 5b wfn = 5 j } w i n 5$ wtn = 5 j } w i n 5b Fuzzy Weighted fuzzy wtp = 5 q5q m i1 n 5$ j } m i1 n 5$ wfn = 5 q5q m i0 n 5$ j } m i0 n 5$ wtp = 5 q5q w i m i1 n 5$ j } w i m i1 n 5$ wfn = 5 q5q w i m i0 n 5$ j } w i m i0 n 5$ wfp = m i1 n 5b + 5 j } m i1 n 5b wtn = m i0 n 5b + 5 j } m i0 n 5b 5 q5q + 5 q5q + wfp = w i m i1 n 5b + 5 j } w i m i1 n 5b wtn = w i m i0 n 5b + 5 j } w i m i0 n 5b 5 q5q + 5 q5q +

31 SNP $ a SNP š b method a b Chi-square statistic (p-value) (0.221) (0.009) Balanced accuracy of MDR Balanced accuracy of wba MDR (α = 0.25) Balanced accuracy of fuzzy MDR (linear, OR = 8) SNP c SNP œ d method c d Chi-square statistic (p-value) (0.025) (2.6E-5) Balanced accuracy of MDR Balanced accuracy of wba MDR (α = 0.25) Balanced accuracy of fuzzy MDR (linear, OR = 8)

32 <Penetrance table> MAF: 0.4 Prevalence: 0.1 SNP_B SNP_A AA (0.36) Aa (0.48) aa (0.16) BB (0.36) Bb (0.48) bb (0.16) P SNP_A Penetrances are the same across genotypes in SNP_A. Penetrances are the same across genotypes in SNP_B. Penetrances are different in genotype combinations of SNP_A and SNP_B P SNP_B

33 Calculations of an example (B) genoty pe # of case # of control Original MDR is high is low TP FP FN TN OR wba MDR log(or ) TP FP FN TN Fuzzy MDR p_high p_low TP FP FN TN sum sum sum

34 Accuracy = 0.6

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

SNP-SNP Interactions in Case-Parent Trios

SNP-SNP Interactions in Case-Parent Trios Detection of SNP-SNP Interactions in Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 2, 2009 Karyotypes http://ghr.nlm.nih.gov/ Single Nucleotide Polymphisms

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs. Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

Detection and characterization of interactions of genetic risk factors in disease

Detection and characterization of interactions of genetic risk factors in disease 4 PROC. OF THE 12th PYTHON IN SCIENCE CONF. (SCIPY 213) Detection and characterization of interactions of genetic risk factors in disease Patricia Francis-Lyon, Shashank Belvadi, Fu-Yuan Cheng http://www.youtube.com/wa?v=ia9mzrcca8

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National

More information

SNP Association Studies with Case-Parent Trios

SNP Association Studies with Case-Parent Trios SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Research Article Detecting Genetic Interactions for Quantitative Traits Using m-spacing Entropy Measure

Research Article Detecting Genetic Interactions for Quantitative Traits Using m-spacing Entropy Measure BioMed Research International Volume 2015, Article ID 523641, 10 pages http://dx.doi.org/10.1155/2015/523641 Research Article Detecting Genetic Interactions for Quantitative Traits Using m-spacing Entropy

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China; Title: Evaluation of genetic susceptibility of common variants in CACNA1D with schizophrenia in Han Chinese Author names and affiliations: Fanglin Guan a,e, Lu Li b, Chuchu Qiao b, Gang Chen b, Tinglin

More information

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics

COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) 12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability

More information

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda 1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;

More information

Bioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.

Bioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics. Bioinformatics Jason H. Moore, Ph.D. Frank Lane Research Scholar in Computational Genetics Associate Professor of Genetics Adjunct Associate Professor of Biological Sciences Adjunct Associate Professor

More information

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17 Modeling IBD for Pairs of Relatives Biostatistics 666 Lecture 7 Previously Linkage Analysis of Relative Pairs IBS Methods Compare observed and expected sharing IBD Methods Account for frequency of shared

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

INTRODUCTION TO GENETIC EPIDEMIOLOGY (GBIO0015-1) Prof. Dr. Dr. K. Van Steen

INTRODUCTION TO GENETIC EPIDEMIOLOGY (GBIO0015-1) Prof. Dr. Dr. K. Van Steen INTRODUCTION TO GENETIC EPIDEMIOLOGY (GBIO0015-1) Prof. Dr. Dr. K. Van Steen CHAPTER 7: A WORLD OF INTERACTIONS 1 Beyond main effects 1.a Dealing with multiplicity 1.b A bird s eye view on roads less travelled

More information

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/35195 holds various files of this Leiden University dissertation Author: Balliu, Brunilda Title: Statistical methods for genetic association studies with

More information

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol Georgetown Unversty From the SelectedWorks of Mark J Meyer 8 Usng the estmated penetrances to determne the range of the underlyng genetc model n casecontrol desgn Mark J Meyer Neal Jeffres Gang Zheng Avalable

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies

Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies BioData Mining RESEARCH Open Access Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies Thorsten Lehr 1,2*, Jing Yuan 2, Dirk Zeumer 1,

More information

opulation genetics undamentals for SNP datasets

opulation genetics undamentals for SNP datasets opulation genetics undamentals for SNP datasets with crocodiles) Sam Banks Charles Darwin University sam.banks@cdu.edu.au I ve got a SNP genotype dataset, now what? Do my data meet the requirements of

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

Introduction to Linkage Disequilibrium

Introduction to Linkage Disequilibrium Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Régression en grande dimension et épistasie par blocs pour les études d association

Régression en grande dimension et épistasie par blocs pour les études d association Régression en grande dimension et épistasie par blocs pour les études d association V. Stanislas, C. Dalmasso, C. Ambroise Laboratoire de Mathématiques et Modélisation d Évry "Statistique et Génome" 1

More information

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES: .5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the

More information

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials Biostatistics (2013), pp. 1 31 doi:10.1093/biostatistics/kxt006 Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials XINYI LIN, SEUNGGUEN

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Aggregated Quantitative Multifactor Dimensionality Reduction

Aggregated Quantitative Multifactor Dimensionality Reduction University of Kentucky UKnowledge Theses and Dissertations--Statistics Statistics 2016 Aggregated Quantitative Multifactor Dimensionality Reduction Rebecca E. Crouch University of Kentucky, rebecca.crouch@uky.edu

More information

#33 - Genomics 11/09/07

#33 - Genomics 11/09/07 BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33

More information

Pearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores

Pearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores Commentary doi: 101111/1469-180900800500x Pearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores Gang Zheng 1, Jungnam Joo 1 and Yaning Yang 1 Office of Biostatistics Research,

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular

More information

How to analyze many contingency tables simultaneously?

How to analyze many contingency tables simultaneously? How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin, 31.10.2012 Outline Motivation: Genetic association studies Statistical

More information

Backward Genotype-Trait Association. in Case-Control Designs

Backward Genotype-Trait Association. in Case-Control Designs Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,

More information

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 6 no. 010, pages 856 86 doi:10.1093/bioinformatics/btq59 Genetics and population analysis Advance Access publication September 4, 010 RAPID detection of gene gene interactions

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1 Principal components analysis (PCA) of all samples analyzed in the discovery phase. Colors represent the phenotype of study populations. a) The first sample

More information

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15. NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Heredity and Genetics WKSH

Heredity and Genetics WKSH Chapter 6, Section 3 Heredity and Genetics WKSH KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. Vocabulary trait purebred law of segregation genetics cross MAIN IDEA:

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES

EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES Submitted to the Annals of Applied Statistics EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES By Matti Pirinen, Peter Donnelly and Chris C.A.

More information

Package ESPRESSO. August 29, 2013

Package ESPRESSO. August 29, 2013 Package ESPRESSO August 29, 2013 Type Package Title Power Analysis and Sample Size Calculation Version 1.1 Date 2011-04-01 Author Amadou Gaye, Paul Burton Maintainer Amadou Gaye The package

More information

Some models of genomic selection

Some models of genomic selection Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/

More information

contents: BreedeR: a R-package implementing statistical models specifically suited for forest genetic resources analysts

contents: BreedeR: a R-package implementing statistical models specifically suited for forest genetic resources analysts contents: definitions components of phenotypic correlations causal components of genetic correlations pleiotropy versus LD scenarios of correlation computing genetic correlations why genetic correlations

More information

PCA vignette Principal components analysis with snpstats

PCA vignette Principal components analysis with snpstats PCA vignette Principal components analysis with snpstats David Clayton October 30, 2018 Principal components analysis has been widely used in population genetics in order to study population structure

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes

Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes Christopher Holmes (joint work with Chris Yau) Department of Statistics, & Wellcome Trust Centre for Human Genetics, University

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

Searching Genome-wide Disease Association Through SNP Data

Searching Genome-wide Disease Association Through SNP Data Georgia State University ScholarWorks @ Georgia State University Computer Science Dissertations Department of Computer Science 8-11-015 Searching Genome-wide Disease Association Through SNP Data Xuan Guo

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

TEST SUMMARY AND FRAMEWORK TEST SUMMARY

TEST SUMMARY AND FRAMEWORK TEST SUMMARY Washington Educator Skills Tests Endorsements (WEST E) TEST SUMMARY AND FRAMEWORK TEST SUMMARY BIOLOGY Copyright 2014 by the Washington Professional Educator Standards Board 1 Washington Educator Skills

More information

QTL Mapping I: Overview and using Inbred Lines

QTL Mapping I: Overview and using Inbred Lines QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different

More information

Module Contact: Dr Doug Yu, BIO Copyright of the University of East Anglia Version 1

Module Contact: Dr Doug Yu, BIO Copyright of the University of East Anglia Version 1 UNIVERSITY OF EAST ANGLIA School of Biological Sciences Main Series UG Examination 2013-2014 EVOLUTIONARY BIOLOGY AND CONSERVATION GENETICS BIO-3C24 Time allowed: 3 hours Answer ALL questions in Section

More information

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

The Lander-Green Algorithm. Biostatistics 666 Lecture 22 The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals

More information

Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions

Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions Jason Westra Department of Statistics, Iowa State University Ames, IA 50011, United States

More information

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota,

More information

Statistical Methods in Mapping Complex Diseases

Statistical Methods in Mapping Complex Diseases University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Statistical Methods in Mapping Complex Diseases Jing He University of Pennsylvania, jinghe@mail.med.upenn.edu

More information

Mapping QTL to a phylogenetic tree

Mapping QTL to a phylogenetic tree Mapping QTL to a phylogenetic tree Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman Human vs mouse www.daviddeen.com 3 Intercross

More information

1 Errors in mitosis and meiosis can result in chromosomal abnormalities.

1 Errors in mitosis and meiosis can result in chromosomal abnormalities. Slide 1 / 21 1 Errors in mitosis and meiosis can result in chromosomal abnormalities. a. Identify and describe a common chromosomal mutation. Slide 2 / 21 Errors in mitosis and meiosis can result in chromosomal

More information

Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing

Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing So Yeun Kwon, Hwan Young Lee, and Kyoung-Jin Shin Department of Forensic Medicine, Yonsei University College of Medicine, Seoul,

More information

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Rui Wang, Yong Li, XiaoFeng Wang, Haixu Tang and Xiaoyong Zhou Indiana University at Bloomington

More information

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. please read pages 38-47; 49-55;57-63. Slide 1 of Chapter 2 1 Extension sot Mendelian Behavior of Genes Single gene inheritance

More information

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Zheyang Wu 1, Hongyu Zhao 1,2 * 1 Department of Epidemiology and Public Health, Yale University School of Medicine, New

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study

Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study Xiang Zhang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial

More information

Evolution of phenotypic traits

Evolution of phenotypic traits Quantitative genetics Evolution of phenotypic traits Very few phenotypic traits are controlled by one locus, as in our previous discussion of genetics and evolution Quantitative genetics considers characters

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Weierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN

Weierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN Weierstraß-Institut für Angewandte Analysis und Stochastik Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN 2198-5855 On an extended interpretation of linkage disequilibrium in genetic

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

GSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H*" ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION

GSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H* ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION FIFTH EDITION IV I ^HHk ^ttm IZTI/^Q i I II MPHBBMWBBIHB '-llwmpbi^hbwm^^pfc ' GSBHSRSBRSRRk LlML I I \l 1MB ^HP'^^MMMP" jflp^^^^^^^^st I Iv^O FROM GENES TO GENOMES %^MiM^PM^^MWi99Mi$9i0^^ ^^^^^^^^^^^^^V^^^fii^^t^i^^^^^

More information

I Have the Power in QTL linkage: single and multilocus analysis

I Have the Power in QTL linkage: single and multilocus analysis I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department

More information

SNP Association Studies with Case-Parent Trios

SNP Association Studies with Case-Parent Trios SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health July, 00 Acknowledgments Collaborators: Qing Li, Rob Scharpf, Holger Schwender,

More information

Miller & Levine Biology

Miller & Levine Biology A Correlation of To the Science Biology A Correlation of, 2014 to the, Table of Contents From Molecules to Organisms: Structures and Processes... 3 Ecosystems: Interactions, Energy, and Dynamics... 4 Heredity:

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Methods for High Dimensional Inferences With Applications in Genomics

Methods for High Dimensional Inferences With Applications in Genomics University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Methods for High Dimensional Inferences With Applications in Genomics Jichun Xie University of Pennsylvania,

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information