A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction
|
|
- Everett Newman
- 5 years ago
- Views:
Transcription
1 A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction Sangseob Leem, Hye-Young Jung, Sungyoung Lee and Taesung Park Bioinformatics and Biostatistics lab Seoul National University
2 Contents 1. Introduction 2. Motivation 3. Method 4. Results 5. Conclusion
3 Interaction SNP $ Ø In single locus association study üno effect SNP 1 üno effect SNP A reason of the Missing heritability
4 MDR method Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer, Ritchie M.D. et al. (2001), Am. J. Hum. Genet., 69, SNP 1 SNP 2 Class Calculate casecontrol ratio Identify high/lowrisk Build 2 2 confusion matrix true positive negative positive TP FP predicted negative FN TN Case Control Case Control High risk High risk 12 4 Low risk Low risk
5 Weaknesses of MDR Biological meaning All possible genotype interaction models are really possible in real world? Log-linear model based MDR (Lee et al. 2007) Computation time Exponential increase by increase of interaction order Filtering based approaches Relief, ReliefF, TuRF, SURF Processing MDR GPU, cumdr Binary classification (# of case, # of control): (2, 1) vs (20, 10), (1, 11) vs (10, 20) Next slide
6 Approaches to overcome simple binary classification Model-based MDR (# of case, # of control): (2, 1) vs (20, 10) Calle, M.L., et al. (2008) MB-MDR: model-based multifactor dimensionality reduction for detecting interactions in high-dimensional genomic data. Ternary classification: high, low and no evidence group wba MDR (# of case, # of control): (11, 1) vs (20, 10) Namkung, J., et al. (2009) New evaluation measures for multifactor dimensionality reduction classifiers in gene gene interaction analysis, Bioinformatics, 25, Weighted balanced accuracy MDR
7 Fuzzy set theory Extension of classical set theory Zadeh, L.A. (1965) Fuzzy sets, Information and control, 8, Degrees of membership Rich or poor vs degree of rich 1/10/100/1000 dollars for a day poor rich poor 1 rich
8 Key difference Original MDR Case Control High risk Low risk wba MDR Fuzzy MDR Case Control Case Control High risk Low risk +1* *2.5 High risk +1* *0. 05 Low risk +1* *0. 95
9 Simple example Case Control High risk Low risk Case Control High risk Low risk
10 Criteria (x) of membership degree (μ H,μ L ) The estimate of the odds ratio (OR) θ. i = n i1 n i0 n 31 n 30 n 56 : the number of individuals with the i th multi locus genotypes in the j th disease group n :6 : the total number of individuals in the j th disease group i = 1,, 3?, j = 1 for case and j = 0 for control (# of case, # of control): (2, 1) vs (20, 10) Standardization z = log(or R ) SE SE = $ X YZ $ X 3Z + $ X Y] $ X 3], log θ 5. = log X YZ X Y] X 3Z X 3] = log n 5$ n :$ log n 5b n :b
11 Membership function Original MDR Fuzzy MDR 0 x < t j klm n, μ g x = m o lm n t j x < t q 1 x t q μ s x = 1-μ g x μ g x = 0 x < t j $ $: tu v o tuv n w xuz t j x < t q 1 x t q μ s x = 1-μ g x,
12 Tuning Parameters Notation F y {,y, y }~5qm,y mq ~ q jƒ 80 (2*2*4*5) combinations Membership function y { = l for linear membership function, y { = s for sigmoid membership function Standardization y = 0 for OR, y = 1 for z Weights w 5 = 1 + ln(or) 5, i = 0, 0.5, 1, 2 Threshold values 2, 4, 8, 16 and 32 for OR , , , and for z
13 Fuzzy MDR procedure(1) Consistent case/control ratio In two loci interactions,?c
14 Fuzzy MDR procedure(2) Original MDR Fuzzy MDR Membership degrees depend on parameter values. TP = n 5$ μ g x 5 5 FN = n 5$ μ s x 5 FP = n 5b μ g x TN = n 5b μ s x 5. 5
15 Empirical Studies Experiments of simulation data Objectives To compare power of Fuzzy MDR with original MDR and wba MDR To find optimal parameter values Data Without marginal effects With marginal effects Generation Parameters F y {, y, y }~5qm, y mq ~ q jƒ Linear/sigmoid, with/without SE, four weight values and 5 threshold values Experiments of real data Bipolar disorder (BD) data in Wellcome Trust Case Control Consortium (WTCCC)
16 Data without marginal effects Structure Four sample sizes 200, 400, 800 and 1600 samples 1000 SNPs Two causative SNPs 70 penetrance tables 7 heritability values 2 minor allele frequencies 5 models Example of penetrance table Model1 AA Aa aa BB Bb Bb Downloaded from
17 200 sample results heritability MAF
18 400 sample results heritability MAF
19 800 sample results heritability MAF
20 1600 sample results heritability MAF
21 Data with marginal effects Structure One sample sizes 2000 cases and 2000 controls 1000 SNPs Two causative SNPs 18 penetrance tables 3 models 3 minor allele frequencies 2 linkage disequilibrium values Model 1 AA Aa aa BB 1 1+θ (1+θ) 2 Bb 1+θ (1+θ) 2 (1+θ) 3 bb (1+θ) 2 (1+θ) 3 (1+θ) 4 Model 2 AA Aa aa BB Bb 1 (1+θ) (1+θ) 2 bb 1 (1+θ) 2 (1+θ) 4 Model 3 AA Aa aa BB Bb 1 1+θ 1+θ bb 1 1+θ 1+θ
22 Results data with marginal effects Model LD MAF
23 index rs number MAF Chromosome (position) gene p-value (rank) 1 rs ( ) 9.82E-06 (8) 2 rs ( ) 1.83E-05 (12) 3 rs ( ) DPP E-05 (10) 4 rs ( ) RNPEPL1 5.03E-06 (3) 5 rs ( ) CMTM8 1.45E-05 (11) 6 rs ( ) LAMP3 5.25E-06 (4) 7 rs ( ) SORCS2 1.13E-01 (17) 8 rs ( ) GLTSCR1L, LOC E-06 (2) 9 rs ( ) 5.39E-05 (14) 10 rs ( ) DFNB E-05 (13) 11 rs ( ) CACNA1C 9.72E-04 (15) 12 rs ( ) TSPAN8 7.22E-02 (16) 13 rs ( ) DGKH 6.23E-01 (19) 14 rs ( ) SLC35F4 1.15E-05 (9) 15 rs ( ) TDRD9 7.69E-06 (6) 16 rs ( ) PALB2 1.33E-07 (1) 17 rs ( ) 9.18E-06 (7) 18 rs ( ) MYO5B 4.79E-01 (18) 19 rs ( ) CDC25B 7.47E-06 (5) Real data BD in WTCCC 1868 cases and 2938 controls 19 SNPs are selected by a literature review Two parameter settings 1. F(L,0,0,3) Linear membership, without SE, without weight, threshold OR = 8 2. F(S,1,1,2) Sigmoid membership, with SE, w $ = 1 + ln(or) $, threshold ZOR = 2*1.96
24 Result of BD in WTCCC F(S, 1,1,2) order SNP combination training accuracy testing accuracy CVC , , 6, , 6, 14, , 6, 9, 11, index rs number MAF Chromosome gene p-value (rank) 15 rs rs TDRD9 (Tudor Domain Containing 9) CDC25B (Cell Division Cycle 25B) 7.69E-06 (6) 7.47E-06 (5)
25 Fuzzy MDR vs Original MDR (Interaction model ) Fuzzy MDR Original MDR M11 has been discovered in real world! M M
26 5. Conclusion A novel and powerful Fuzzy MDR for gene-gene interaction analysis Based on fuzzy set theory H and L risk groups are fuzzy sets Original MDR is a special case of Fuzzy MDR More flexible interpretation by the degree of membership of each multi-locus genotype Potential of extension Future work Determining of the optimal tuning parameter values Extensions
27 Thank you.
28 References Ritchie, M.D., et al. (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer, Am J Hum Genet, 69, Velez, D.R., et al. (2007) A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction, Genet Epidemiol, 31, Leem, S., et al. (2014) Fast detection of high-order epistaticinteractions in genome-wide association studies using information theoretic measure, Computational Biology and Chemistry, 50, Burton, P.R., et al. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls, Nature, 447, Li, W. and Reich, J. (2000) A Complete Enumeration and Classification of Two-Locus Disease Models, Human Heredity, 50,
29 Limit of single-locus association studies <Penetrance table> MAF: 0.4 Prevalence: 0.1 SNP_B SNP_A AA (0.36) Aa (0.48) aa (0.16) BB (0.36) Bb (0.48) bb (0.16) P SNP_A Penetrances are the same across genotypes in SNP_A. Penetrances are the same across genotypes in SNP_B. Penetrances are different in genotype combinations of SNP_A and SNP_B P SNP_B
30 Method Original Confusion matrix calculation wtp = 5 q5q n 5$ wfp = 5 q5q n 5b wfn = 5 j } n 5$ wtn = 5 j } n 5b Weighted wtp = 5 q5q w i n 5$ wfp = 5 q5q w i n 5b wfn = 5 j } w i n 5$ wtn = 5 j } w i n 5b Fuzzy Weighted fuzzy wtp = 5 q5q m i1 n 5$ j } m i1 n 5$ wfn = 5 q5q m i0 n 5$ j } m i0 n 5$ wtp = 5 q5q w i m i1 n 5$ j } w i m i1 n 5$ wfn = 5 q5q w i m i0 n 5$ j } w i m i0 n 5$ wfp = m i1 n 5b + 5 j } m i1 n 5b wtn = m i0 n 5b + 5 j } m i0 n 5b 5 q5q + 5 q5q + wfp = w i m i1 n 5b + 5 j } w i m i1 n 5b wtn = w i m i0 n 5b + 5 j } w i m i0 n 5b 5 q5q + 5 q5q +
31 SNP $ a SNP š b method a b Chi-square statistic (p-value) (0.221) (0.009) Balanced accuracy of MDR Balanced accuracy of wba MDR (α = 0.25) Balanced accuracy of fuzzy MDR (linear, OR = 8) SNP c SNP œ d method c d Chi-square statistic (p-value) (0.025) (2.6E-5) Balanced accuracy of MDR Balanced accuracy of wba MDR (α = 0.25) Balanced accuracy of fuzzy MDR (linear, OR = 8)
32 <Penetrance table> MAF: 0.4 Prevalence: 0.1 SNP_B SNP_A AA (0.36) Aa (0.48) aa (0.16) BB (0.36) Bb (0.48) bb (0.16) P SNP_A Penetrances are the same across genotypes in SNP_A. Penetrances are the same across genotypes in SNP_B. Penetrances are different in genotype combinations of SNP_A and SNP_B P SNP_B
33 Calculations of an example (B) genoty pe # of case # of control Original MDR is high is low TP FP FN TN OR wba MDR log(or ) TP FP FN TN Fuzzy MDR p_high p_low TP FP FN TN sum sum sum
34 Accuracy = 0.6
Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017
Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationSNP-SNP Interactions in Case-Parent Trios
Detection of SNP-SNP Interactions in Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 2, 2009 Karyotypes http://ghr.nlm.nih.gov/ Single Nucleotide Polymphisms
More informationNature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.
Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.
More informationBTRY 7210: Topics in Quantitative Genomics and Genetics
BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:
More informationTheoretical and computational aspects of association tests: application in case-control genome-wide association studies.
Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus
More informationDetection and characterization of interactions of genetic risk factors in disease
4 PROC. OF THE 12th PYTHON IN SCIENCE CONF. (SCIPY 213) Detection and characterization of interactions of genetic risk factors in disease Patricia Francis-Lyon, Shashank Belvadi, Fu-Yuan Cheng http://www.youtube.com/wa?v=ia9mzrcca8
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationSNP Association Studies with Case-Parent Trios
SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationResearch Article Detecting Genetic Interactions for Quantitative Traits Using m-spacing Entropy Measure
BioMed Research International Volume 2015, Article ID 523641, 10 pages http://dx.doi.org/10.1155/2015/523641 Research Article Detecting Genetic Interactions for Quantitative Traits Using m-spacing Entropy
More informationProportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power
Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationMODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES
MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationDepartment of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;
Title: Evaluation of genetic susceptibility of common variants in CACNA1D with schizophrenia in Han Chinese Author names and affiliations: Fanglin Guan a,e, Lu Li b, Chuchu Qiao b, Gang Chen b, Tinglin
More informationCOMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics
COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationHERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)
BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability
More informationPopulation Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda
1 Population Genetics with implications for Linkage Disequilibrium Chiara Sabatti, Human Genetics 6357a Gonda csabatti@mednet.ucla.edu 2 Hardy-Weinberg Hypotheses: infinite populations; no inbreeding;
More informationBioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.
Bioinformatics Jason H. Moore, Ph.D. Frank Lane Research Scholar in Computational Genetics Associate Professor of Genetics Adjunct Associate Professor of Biological Sciences Adjunct Associate Professor
More informationModeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17
Modeling IBD for Pairs of Relatives Biostatistics 666 Lecture 7 Previously Linkage Analysis of Relative Pairs IBS Methods Compare observed and expected sharing IBD Methods Account for frequency of shared
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationINTRODUCTION TO GENETIC EPIDEMIOLOGY (GBIO0015-1) Prof. Dr. Dr. K. Van Steen
INTRODUCTION TO GENETIC EPIDEMIOLOGY (GBIO0015-1) Prof. Dr. Dr. K. Van Steen CHAPTER 7: A WORLD OF INTERACTIONS 1 Beyond main effects 1.a Dealing with multiplicity 1.b A bird s eye view on roads less travelled
More informationOn the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease
On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/35195 holds various files of this Leiden University dissertation Author: Balliu, Brunilda Title: Statistical methods for genetic association studies with
More informationUsing the estimated penetrances to determine the range of the underlying genetic model in casecontrol
Georgetown Unversty From the SelectedWorks of Mark J Meyer 8 Usng the estmated penetrances to determne the range of the underlyng genetc model n casecontrol desgn Mark J Meyer Neal Jeffres Gang Zheng Avalable
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationRule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies
BioData Mining RESEARCH Open Access Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies Thorsten Lehr 1,2*, Jing Yuan 2, Dirk Zeumer 1,
More informationopulation genetics undamentals for SNP datasets
opulation genetics undamentals for SNP datasets with crocodiles) Sam Banks Charles Darwin University sam.banks@cdu.edu.au I ve got a SNP genotype dataset, now what? Do my data meet the requirements of
More informationHybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationFriday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo
Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationUsing a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics
Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationRégression en grande dimension et épistasie par blocs pour les études d association
Régression en grande dimension et épistasie par blocs pour les études d association V. Stanislas, C. Dalmasso, C. Ambroise Laboratoire de Mathématiques et Modélisation d Évry "Statistique et Génome" 1
More information1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:
.5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the
More informationTest for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials
Biostatistics (2013), pp. 1 31 doi:10.1093/biostatistics/kxt006 Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials XINYI LIN, SEUNGGUEN
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationQTL model selection: key players
Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationAggregated Quantitative Multifactor Dimensionality Reduction
University of Kentucky UKnowledge Theses and Dissertations--Statistics Statistics 2016 Aggregated Quantitative Multifactor Dimensionality Reduction Rebecca E. Crouch University of Kentucky, rebecca.crouch@uky.edu
More information#33 - Genomics 11/09/07
BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33
More informationPearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores
Commentary doi: 101111/1469-180900800500x Pearson s Test, Trend Test, and MAX Are All Trend Tests with Different Types of Scores Gang Zheng 1, Jungnam Joo 1 and Yaning Yang 1 Office of Biostatistics Research,
More informationGene mapping in model organisms
Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2
More informationUNIT 8 BIOLOGY: Meiosis and Heredity Page 148
UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular
More informationHow to analyze many contingency tables simultaneously?
How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin, 31.10.2012 Outline Motivation: Genetic association studies Statistical
More informationBackward Genotype-Trait Association. in Case-Control Designs
Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,
More informationBinomial Mixture Model-based Association Tests under Genetic Heterogeneity
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 6 no. 010, pages 856 86 doi:10.1093/bioinformatics/btq59 Genetics and population analysis Advance Access publication September 4, 010 RAPID detection of gene gene interactions
More informationSupplementary Figures
Supplementary Figures Supplementary Figure 1 Principal components analysis (PCA) of all samples analyzed in the discovery phase. Colors represent the phenotype of study populations. a) The first sample
More informationNIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.
NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationHeredity and Genetics WKSH
Chapter 6, Section 3 Heredity and Genetics WKSH KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. Vocabulary trait purebred law of segregation genetics cross MAIN IDEA:
More informationLecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015
Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.
More informationTutorial Session 2. MCMC for the analysis of genetic data on pedigrees:
MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation
More informationEFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES
Submitted to the Annals of Applied Statistics EFFICIENT COMPUTATION WITH A LINEAR MIXED MODEL ON LARGE-SCALE DATA SETS WITH APPLICATIONS TO GENETIC STUDIES By Matti Pirinen, Peter Donnelly and Chris C.A.
More informationPackage ESPRESSO. August 29, 2013
Package ESPRESSO August 29, 2013 Type Package Title Power Analysis and Sample Size Calculation Version 1.1 Date 2011-04-01 Author Amadou Gaye, Paul Burton Maintainer Amadou Gaye The package
More informationSome models of genomic selection
Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/
More informationcontents: BreedeR: a R-package implementing statistical models specifically suited for forest genetic resources analysts
contents: definitions components of phenotypic correlations causal components of genetic correlations pleiotropy versus LD scenarios of correlation computing genetic correlations why genetic correlations
More informationPCA vignette Principal components analysis with snpstats
PCA vignette Principal components analysis with snpstats David Clayton October 30, 2018 Principal components analysis has been widely used in population genetics in order to study population structure
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationDecision Theoretic Classification of Copy-Number-Variation in Cancer Genomes
Decision Theoretic Classification of Copy-Number-Variation in Cancer Genomes Christopher Holmes (joint work with Chris Yau) Department of Statistics, & Wellcome Trust Centre for Human Genetics, University
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationLecture WS Evolutionary Genetics Part I 1
Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in
More informationSearching Genome-wide Disease Association Through SNP Data
Georgia State University ScholarWorks @ Georgia State University Computer Science Dissertations Department of Computer Science 8-11-015 Searching Genome-wide Disease Association Through SNP Data Xuan Guo
More informationHomework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:
Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships
More informationTEST SUMMARY AND FRAMEWORK TEST SUMMARY
Washington Educator Skills Tests Endorsements (WEST E) TEST SUMMARY AND FRAMEWORK TEST SUMMARY BIOLOGY Copyright 2014 by the Washington Professional Educator Standards Board 1 Washington Educator Skills
More informationQTL Mapping I: Overview and using Inbred Lines
QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different
More informationModule Contact: Dr Doug Yu, BIO Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Biological Sciences Main Series UG Examination 2013-2014 EVOLUTIONARY BIOLOGY AND CONSERVATION GENETICS BIO-3C24 Time allowed: 3 hours Answer ALL questions in Section
More informationThe Lander-Green Algorithm. Biostatistics 666 Lecture 22
The Lander-Green Algorithm Biostatistics 666 Lecture Last Lecture Relationship Inferrence Likelihood of genotype data Adapt calculation to different relationships Siblings Half-Siblings Unrelated individuals
More informationAnalyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions
Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions Jason Westra Department of Statistics, Iowa State University Ames, IA 50011, United States
More informationRelationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing
Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota,
More informationStatistical Methods in Mapping Complex Diseases
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Statistical Methods in Mapping Complex Diseases Jing He University of Pennsylvania, jinghe@mail.med.upenn.edu
More informationMapping QTL to a phylogenetic tree
Mapping QTL to a phylogenetic tree Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman Human vs mouse www.daviddeen.com 3 Intercross
More information1 Errors in mitosis and meiosis can result in chromosomal abnormalities.
Slide 1 / 21 1 Errors in mitosis and meiosis can result in chromosomal abnormalities. a. Identify and describe a common chromosomal mutation. Slide 2 / 21 Errors in mitosis and meiosis can result in chromosomal
More informationAnalysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing
Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing So Yeun Kwon, Hwan Young Lee, and Kyoung-Jin Shin Department of Forensic Medicine, Yonsei University College of Medicine, Seoul,
More informationLearning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study
Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Rui Wang, Yong Li, XiaoFeng Wang, Haixu Tang and Xiaoyong Zhou Indiana University at Bloomington
More informationChapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.
Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype. please read pages 38-47; 49-55;57-63. Slide 1 of Chapter 2 1 Extension sot Mendelian Behavior of Genes Single gene inheritance
More informationStatistical Power of Model Selection Strategies for Genome-Wide Association Studies
Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Zheyang Wu 1, Hongyu Zhao 1,2 * 1 Department of Epidemiology and Public Health, Yale University School of Medicine, New
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationEfficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study
Efficient Algorithms for Detecting Genetic Interactions in Genome-Wide Association Study Xiang Zhang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial
More informationEvolution of phenotypic traits
Quantitative genetics Evolution of phenotypic traits Very few phenotypic traits are controlled by one locus, as in our previous discussion of genetics and evolution Quantitative genetics considers characters
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationWeierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN
Weierstraß-Institut für Angewandte Analysis und Stochastik Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN 2198-5855 On an extended interpretation of linkage disequilibrium in genetic
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationGSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H*" ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION
FIFTH EDITION IV I ^HHk ^ttm IZTI/^Q i I II MPHBBMWBBIHB '-llwmpbi^hbwm^^pfc ' GSBHSRSBRSRRk LlML I I \l 1MB ^HP'^^MMMP" jflp^^^^^^^^st I Iv^O FROM GENES TO GENOMES %^MiM^PM^^MWi99Mi$9i0^^ ^^^^^^^^^^^^^V^^^fii^^t^i^^^^^
More informationI Have the Power in QTL linkage: single and multilocus analysis
I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department
More informationSNP Association Studies with Case-Parent Trios
SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health July, 00 Acknowledgments Collaborators: Qing Li, Rob Scharpf, Holger Schwender,
More informationMiller & Levine Biology
A Correlation of To the Science Biology A Correlation of, 2014 to the, Table of Contents From Molecules to Organisms: Structures and Processes... 3 Ecosystems: Interactions, Energy, and Dynamics... 4 Heredity:
More informationMultiple QTL mapping
Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power
More informationMethods for High Dimensional Inferences With Applications in Genomics
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Methods for High Dimensional Inferences With Applications in Genomics Jichun Xie University of Pennsylvania,
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More information