NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees
|
|
- Anis Sherman
- 5 years ago
- Views:
Transcription
1 NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana Champaign RECOMB-CG October 11, 2018 Molloy & Warnow NJMerge 1/58
2 Motivation Project to sequence the genomes of 10,000 vertebrate species Molloy & Warnow NJMerge 2/58
3 Motivation Project to sequence the genome of at least one individual from each of the 66,000 vertebrate species Molloy & Warnow NJMerge 3/58
4 Challenges Most accurate methods attempt to solve NP-Hard optimization problems (e.g., maximum likelihood). Solution space (of unrooted binary trees on N leaves) grows exponentially with N. Molloy & Warnow NJMerge 4/58
5 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 5/58
6 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 6/58
7 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 7/58
8 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 8/58
9 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 9/58
10 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 10/58
11 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 11/58
12 Divide-and-Conquer Strategy using NJMerge Molloy & Warnow NJMerge 12/58
13 Divide-and-Conquer Strategy using NJMerge Molloy & Warnow NJMerge 13/58
14 Merged together = Compatibility Supertree Molloy & Warnow NJMerge 14/58
15 Merged together = Compatibility Supertree Molloy & Warnow NJMerge 15/58
16 Merged together = Compatibility Supertree Molloy & Warnow NJMerge 16/58
17 NJMerge is an extension of Neighbor Joining (NJ). [Saitou and Nei, 1987] NJ runs in O(N 3 ) time. Molloy & Warnow NJMerge 17/58
18 Neighbor Joining [Saitou and Nei, 1987] Start with a set of nodes N corresponding to N leaves. Then Identify a siblinghood or pair of nodes, x and y. Remove nodes x and y from N. Add node (x, y) to N. Repeat until only one node. Molloy & Warnow NJMerge 18/58
19 NJMerge NOTE: Distance matrix is additive for (((A, B), (C, D)), E, (F, (G, H))); Molloy & Warnow NJMerge 19/58
20 NJMerge Before accepting a siblinghood proposal, check whether it violates a constraint tree makes the set of constraint trees incompatible Molloy & Warnow NJMerge 20/58
21 NJMerge NOTE: Distance matrix is additive for (((A, B), (C, D)), E, (F, (G, H))); Molloy & Warnow NJMerge 21/58
22 NJMerge Determining the compatibility of a set of unrooted phylogenetic trees is NP-complete [Steel, 1992; Warnow, 1994]. NJMerge uses a polynomial-time heuristic. Update constraint trees based on siblinghood proposal (x, y). Test compatibility of the constraint trees with leaf (x, y) in polynomial time [Aho et al., 1981]. Not every constraint tree will contain leaf (x, y), so it is possible for NJMerge to fail. Molloy & Warnow NJMerge 22/58
23 Is NJMerge an effective approach in theory? Molloy & Warnow NJMerge 23/58
24 Correctness Guarantee In preparation for journal extension Theorem Given constraint trees that agree with T and a distance matrix that is nearly additive 1 for T, NJMerge returns T. 1 An n n matrix M is nearly additive if each entry M[i, j] differs from the distance between leaf i and leaf j in T by less than one half of the shortest branch length in T. Molloy & Warnow NJMerge 24/58
25 Statistical Consistency In preparation for journal extension Corollary NJMerge can be used in gene tree estimation pipelines that are statistically consistent under the GTR model of sequence evolution. Molloy & Warnow NJMerge 25/58
26 Statistical Consistency In preparation for journal extension Corollary NJMerge can be used in species tree estimation pipelines that are statistically consistent under the Multi-Species Coalescent model. 2 2 Proof assumes both the number of genes and the number of sites per gene goes to infinity. Molloy & Warnow NJMerge 26/58
27 Is NJMerge an effective approach in practice? Molloy & Warnow NJMerge 27/58
28 Species Tree Estimation Molloy & Warnow NJMerge 28/58
29 Gene tree matching the species tree Molloy & Warnow NJMerge 29/58
30 Gene tree differing from the species trees due to Incomplete Lineage Sorting (ILS) Molloy & Warnow NJMerge 30/58
31 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 31/58
32 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 32/58
33 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 33/58
34 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 34/58
35 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 35/58
36 Species Tree Estimation Pipelines using NJMerge Distance Matrix 1) Estimate gene trees using FastTree-2 [Price et al., 2010]. 2) Compute average number of edges between pairs of species [e.g., Liu and Yu, 2011]. Constraint Trees ASTRAL-III Unpartitioned concatenation using RAxML Molloy & Warnow NJMerge 36/58
37 Evaluation Metrics Robinson-Foulds (RF) distance between the true and the estimated species trees Running time All methods limited to a single compute node with 16 cores 64 GB of physical memory 48 hours maximum wall-clock time Molloy & Warnow NJMerge 37/58
38 Running time T = ( T1 Method + + Tk Method ) + T NJMerge where k = number of subsets T Method i = time to compute tree on subset i for given method T NJMerge = time to run NJMerge Molloy & Warnow NJMerge 38/58
39 Case 1: ASTRAL-III [Mirarab et al., 2014; Mirarab and Warnow, 2015; Zhang et al., 2018] Uses dynamic programming to solve Maximum Quartet Consistency problem within a constrained search space Statistically consistent under the Multi-Species Coalescent model (assuming true gene trees) Molloy & Warnow NJMerge 39/58
40 Case 1: ASTRAL-III on 1000-taxon, 1000-gene datasets Moderate ILS Very High ILS Running Time (m) ASTRAL Method NJMerge+ASTRAL NOTE: ASTRAL did not complete within 48 hours on 4/20 datasets with high ILS. Molloy & Warnow NJMerge 40/58
41 Case 1: ASTRAL-III on 1000-taxon, 1000-gene datasets Species Tree Error Moderate ILS ASTRAL Method Very High ILS NJMerge+ASTRAL NOTE: ASTRAL did not complete within 48 hours on 4/20 datasets with high ILS. Molloy & Warnow NJMerge 41/58
42 Case 2: RAxML [Stamatakis, 2014] Uses heuristics to search for tree with best likelihood score Not statistically consistent under the Multi-Species Coalescent model [Roch and Steel, 2015] Molloy & Warnow NJMerge 42/58
43 Case 2: RAxML 100 species, 1000 introns 1000 species, 1000 introns Running Time (m) Moderate Very High Moderate Very High RAxML Level of ILS NJMerge+RAxML NOTE: Bar graphs show all datasets on which at least one method completed. Molloy & Warnow NJMerge 43/58
44 Case 2: RAxML on 1000-taxon, 1000-gene datasets RAxML ran on 0/20 datasets with moderate ILS and 1/20 with high ILS due to Out of Memory errors. For the one dataset on which RAxML could run, RAxML did not complete within 48 hours. In contrast, NJMerge+RAxML completed on 40/40 of these datasets in less than 30 hours (often in less than 17 hours). Molloy & Warnow NJMerge 44/58
45 Case 2: RAxML Species Tree Error species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High RAxML Level of ILS NJMerge+RAxML NOTE: Box plots show all datasets on which at least one method completed. Molloy & Warnow NJMerge 45/58
46 Conclusions NJMerge runs in polynomial time can be used to build statistically consistent divide-and-conquer pipelines enabled ASTRAL-III and RAxML to analyze 1000-taxon datasets using a single compute node with 16 cores 64 GB of physical memory 48 hours maximum wall-clock time never failed on 1000-taxon datasets; failed on less than 1% of all datasets tested Molloy & Warnow NJMerge 46/58
47 Funding This work was supported by the U.S. National Science Foundation Graduate Research Fellowship Program (DGE ) Blue Waters Sustained-Petascale Computing Project (OCI and ACI ) Graph-Theoretic Algorithms to Improve Phylogenomic Analyses (CCF ) and the Ira & Debra Cohen Graduate Fellowship in Computer Science. Molloy & Warnow NJMerge 47/58
48 Which species tree method should I use? on 1000-taxon, 1000-gene datasets Species Tree Error Moderate ILS Very High ILS Method NJMerge+ASTRAL NJMerge+RAxML NJMerge+SVDquartets NJst Molloy & Warnow NJMerge 48/58
49 How does the estimated distance matrix impact NJMerge? 100 taxa, Moderate ILS 100 taxa, Very High ILS 0.30 Species Tree Error Number of Genes NJst NJMerge+True Molloy & Warnow NJMerge 49/58
50 References Nakhleh et al. (2001). Designing fast converging phylogenetic methods. Bioinformatics 17(Suppl. 1):S190 S198. Nelesen et al. (2012). DACTAL: Divide-And-Conquer Trees (almost) without Alignments. Bioinformatics 28(12):i274 i282. Bayzid et al. (2014). Disk Covering Methods Improve Phylogenomic Analyses. BMC Genomics15(Suppl 6): S7. Jiang et al. (2001). A polynomial time approximation scheme for inferring evolutationay trees from quartet topologies and its application. SIAM Journal on Computing 30(6): Molloy & Warnow NJMerge 50/58
51 References Bansal et al. (2010). Robinson-Foulds Supertrees. Algorithms for Molecular Biology 5(1). Saitou and Nei. (1987). The neighbor-joining method: a new method for reconstruction of phylogenetic trees. Molecular Biology and Evolution 4: Steel. (1992). The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification 9: Warnow (1994). Tree compatibility and inferring evolutionary history. Journal of Algorithms, 16(3): Molloy & Warnow NJMerge 51/58
52 References Agho et al. (1981). Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions. SIAM Journal on Computing, 10(3): Atteson. (1999). The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction. Algorithmica, 25(2-3): Price et al. (2010). FastTree 2 Approximately Maximum Likelihood Trees for Large Alignments. PLoS ONE 5(3):1 10. Liu and Yu. (2011). Estimating Species Trees from Unrooted Gene Trees. Systematic Biology, 60(5): Molloy & Warnow NJMerge 52/58
53 References Mirarab et al. (2015). PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences. Journal of Computational Biology 22(5): Mirarab et al. (2014) ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics, 30(17):i541 i548. Mirarab and Warnow. (2015). ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12):i44 i52. Zhang et al. (2017). ASTRAL-III: Increased Scalability and Impacts of Contracting Low Support Branches. In Meidanis and Nakhleh, editors, Comparative Genomics. RECOMB- CG Lecture Notes in Computer Science, volume Springer, Cham. Molloy & Warnow NJMerge 53/58
54 References Stamatakis. (2014). RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics, 30(9). Roch and Steel. (2015). Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theoretical Population Biology 100C: Chifman and Kubatko. (2014). Quartet Inference from SNP Data Under the Coalescent Model. Bioinformatics, 30(23): Molloy & Warnow NJMerge 54/58
55 Correctness Guarantee In preparation for journal extension Proof. NJ applied to a nearly additive distance matrix for T will return T [Atteson, 1999]. Because all trees in T agree with T, the siblinghood proposals indicated by NJ will never violate a constraint tree, so NJMerge will simply operate as traditional NJ and return T. Molloy & Warnow NJMerge 55/58
56 Species Tree Estimation Pipelines using NJMerge Molloy & Warnow NJMerge 56/58
57 Simulated Datasets Dataset Size 100 and 1000 species 1000 genes varying in length from 300 to 1500 bp Level of Incomplete Lineage Sorting (ILS) Moderate: 8-10% AD Very high: 68-69% AD where AD is the Average (RF) Distance between the true species tree and the tree gene trees. Molloy & Warnow NJMerge 57/58
58 ASTRAL [Mirarab et al., 2014; Mirarab and Warnow, 2015; Zhang et al., 2018] Uses dynamic programming to solve the Maximum Quartet Consistency Problem problem within a constrained search space X Runs in O(nm X ) n is the number of species m is the number of genes For ASTRAL-III X = O(nm) Runs in O((nm) ) Molloy & Warnow NJMerge 58/58
first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees
Concatenation Analyses in the Presence of Incomplete Lineage Sorting May 22, 2015 Tree of Life Tandy Warnow Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting.. 2015 May 22.
More informationCS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign
CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Today Explain the course Introduce some of the research in this area Describe some open problems Talk about
More informationCS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign
CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Course Staff Professor Tandy Warnow Office hours Tuesdays after class (2-3 PM) in Siebel 3235 Email address:
More informationPhylogenetic Geometry
Phylogenetic Geometry Ruth Davidson University of Illinois Urbana-Champaign Department of Mathematics Mathematics and Statistics Seminar Washington State University-Vancouver September 26, 2016 Phylogenies
More informationPhylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign
Phylogenomics, Multiple Sequence Alignment, and Metagenomics Tandy Warnow University of Illinois at Urbana-Champaign Phylogeny (evolutionary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the
More informationCS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012
CS 394C Algorithms for Computational Biology Tandy Warnow Spring 2012 Biology: 21st Century Science! When the human genome was sequenced seven years ago, scientists knew that most of the major scientific
More informationASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support
ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support Siavash Mirarab University of California, San Diego Joint work with Tandy Warnow Erfan Sayyari
More informationUsing Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics
Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu
More informationJed Chou. April 13, 2015
of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene
More informationTheDisk-Covering MethodforTree Reconstruction
TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document
More informationTHE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT
COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.
More informationNon-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore
Non-binary Tree Reconciliation Louxin Zhang Department of Mathematics National University of Singapore matzlx@nus.edu.sg Introduction: Gene Duplication Inference Consider a duplication gene family G Species
More informationPhylogenetic Networks, Trees, and Clusters
Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University
More informationOn the variance of internode distance under the multispecies coalescent
On the variance of internode distance under the multispecies coalescent Sébastien Roch 1[0000 000 7608 8550 Department of Mathematics University of Wisconsin Madison Madison, WI 53706 roch@math.wisc.edu
More informationReconstruction of species trees from gene trees using ASTRAL. Siavash Mirarab University of California, San Diego (ECE)
Reconstruction of species trees from gene trees using ASTRAL Siavash Mirarab University of California, San Diego (ECE) Phylogenomics Orangutan Chimpanzee gene 1 gene 2 gene 999 gene 1000 Gorilla Human
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationUpcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego
Upcoming challenges in phylogenomics Siavash Mirarab University of California, San Diego Gene tree discordance The species tree gene1000 Causes of gene tree discordance include: Incomplete Lineage Sorting
More informationFast coalescent-based branch support using local quartet frequencies
Fast coalescent-based branch support using local quartet frequencies Molecular Biology and Evolution (2016) 33 (7): 1654 68 Erfan Sayyari, Siavash Mirarab University of California, San Diego (ECE) anzee
More informationA Phylogenetic Network Construction due to Constrained Recombination
A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer
More informationSpecies Tree Inference using SVDquartets
Species Tree Inference using SVDquartets Laura Kubatko and Dave Swofford May 19, 2015 Laura Kubatko SVDquartets May 19, 2015 1 / 11 SVDquartets In this tutorial, we ll discuss several different data types:
More informationLet S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationCS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003
CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang
More informationSmith et al. American Journal of Botany 98(3): Data Supplement S2 page 1
Smith et al. American Journal of Botany 98(3):404-414. 2011. Data Supplement S1 page 1 Smith, Stephen A., Jeremy M. Beaulieu, Alexandros Stamatakis, and Michael J. Donoghue. 2011. Understanding angiosperm
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationBIOINFORMATICS. Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data. Jijun Tang 1 and Bernard M.E. Moret 1
BIOINFORMATICS Vol. 1 no. 1 2003 Pages 1 8 Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data Jijun Tang 1 and Bernard M.E. Moret 1 1 Department of Computer Science, University of New
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationFrom Gene Trees to Species Trees. Tandy Warnow The University of Texas at Aus<n
From Gene Trees to Species Trees Tandy Warnow The University of Texas at Aus
More informationPlan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method
Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary
More informationTheory of Evolution Charles Darwin
Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationIncomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-2010 Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci Elchanan Mossel University of
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationMul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu
Mul$ple Sequence Alignment Methods Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Species Tree Orangutan Gorilla Chimpanzee Human From the Tree of the Life
More informationarxiv: v1 [q-bio.pe] 3 May 2016
PHYLOGENETIC TREES AND EUCLIDEAN EMBEDDINGS MARK LAYER AND JOHN A. RHODES arxiv:1605.01039v1 [q-bio.pe] 3 May 2016 Abstract. It was recently observed by de Vienne et al. that a simple square root transformation
More informationAnatomy of a species tree
Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 5
CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationThe impact of missing data on species tree estimation
MBE Advance Access published November 2, 215 The impact of missing data on species tree estimation Zhenxiang Xi, 1 Liang Liu, 2,3 and Charles C. Davis*,1 1 Department of Organismic and Evolutionary Biology,
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationConsistency Index (CI)
Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)
More informationWeighted Quartets Phylogenetics
Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087 Problem: quartet-based supertree Input Output A B C D A C D E
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationPhylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz
Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels
More informationEvaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets
Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets Xiaofan Zhou, 1,2 Xing-Xing Shen, 3 Chris Todd Hittinger, 4 and Antonis Rokas*,3 1 Integrative Microbiology
More informationSEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas
SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas Phylogeny (evolutionary tree) Orangutan From the Tree of the Life Website, University of Arizona Gorilla
More informationSEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas
SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas Metagenomics: Venter et al., Exploring the Sargasso Sea: Scientists Discover One Million New Genes
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationDisk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 6, Numbers 3/4, 1999 Mary Ann Liebert, Inc. Pp. 369 386 Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction DANIEL H. HUSON, 1 SCOTT M.
More informationPhylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.
Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony
More informationPhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence
PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,
More informationWorkshop III: Evolutionary Genomics
Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint
More informationDNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi
DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :
More informationA new algorithm to construct phylogenetic networks from trees
A new algorithm to construct phylogenetic networks from trees J. Wang College of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia, China Corresponding author: J. Wang E-mail: wangjuanangle@hit.edu.cn
More informationQuartet Inference from SNP Data Under the Coalescent Model
Bioinformatics Advance Access published August 7, 2014 Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman 1 and Laura Kubatko 2,3 1 Department of Cancer Biology, Wake Forest School
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods
More informationPhylogenetics in the Age of Genomics: Prospects and Challenges
Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationDistances that Perfectly Mislead
Syst. Biol. 53(2):327 332, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490423809 Distances that Perfectly Mislead DANIEL H. HUSON 1 AND
More informationAn Investigation of Phylogenetic Likelihood Methods
An Investigation of Phylogenetic Likelihood Methods Tiffani L. Williams and Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131-1386 Email: tlw,moret @cs.unm.edu
More informationInferring a level-1 phylogenetic network from a dense set of rooted triplets
Theoretical Computer Science 363 (2006) 60 68 www.elsevier.com/locate/tcs Inferring a level-1 phylogenetic network from a dense set of rooted triplets Jesper Jansson a,, Wing-Kin Sung a,b, a School of
More informationPhylogeny Tree Algorithms
Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some
More informationarxiv: v2 [q-bio.pe] 4 Feb 2016
Annals of the Institute of Statistical Mathematics manuscript No. (will be inserted by the editor) Distributions of topological tree metrics between a species tree and a gene tree Jing Xi Jin Xie Ruriko
More informationBioinformatics tools for phylogeny and visualization. Yanbin Yin
Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and
More informationIsolating - A New Resampling Method for Gene Order Data
Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence
More informationStochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions
PLGW05 Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 1 joint work with Ilan Gronau 2, Shlomo Moran 3, and Irad Yavneh 3 1 2 Dept. of Biological Statistics and Computational
More informationFast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study
Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Li-San Wang Robert K. Jansen Dept. of Computer Sciences Section of Integrative Biology University of Texas, Austin,
More informationProperties of Consensus Methods for Inferring Species Trees from Gene Trees
Syst. Biol. 58(1):35 54, 2009 Copyright c Society of Systematic Biologists DOI:10.1093/sysbio/syp008 Properties of Consensus Methods for Inferring Species Trees from Gene Trees JAMES H. DEGNAN 1,4,,MICHAEL
More informationIn comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW
REVIEW Challenges in Species Tree Estimation Under the Multispecies Coalescent Model Bo Xu* and Ziheng Yang*,,1 *Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China and Department
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationCharles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel
SUPERTREE ALGORITHMS FOR ANCESTRAL DIVERGENCE DATES AND NESTED TAXA Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel Department of Mathematics and Statistics University of Canterbury
More informationTheory of Evolution. Charles Darwin
Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationA Minimum Spanning Tree Framework for Inferring Phylogenies
A Minimum Spanning Tree Framework for Inferring Phylogenies Daniel Giannico Adkins Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-010-157
More informationSTEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)
STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu
More informationBIOINFORMATICS DISCOVERY NOTE
BIOINFORMATICS DISCOVERY NOTE Designing Fast Converging Phylogenetic Methods!" #%$&('$*),+"-%./ 0/132-%$ 0*)543768$'9;:(0'=A@B2$0*)A@B'9;9CD
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationRECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS
RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS KT Huber, V Moulton, C Semple, and M Steel Department of Mathematics and Statistics University of Canterbury Private Bag 4800 Christchurch,
More informationSUPPLEMENTARY INFORMATION
Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,
More informationarxiv: v1 [q-bio.pe] 1 Jun 2014
THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from
More informationarxiv: v1 [cs.cc] 9 Oct 2014
Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz May 7, 08 arxiv:40.7v [cs.cc] 9 Oct 04 Abstract A ternary
More informationA New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data
A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data Mary E. Cosner Dept. of Plant Biology Ohio State University Li-San Wang Dept.
More informationA (short) introduction to phylogenetics
A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field
More informationarxiv: v1 [q-bio.pe] 6 Jun 2013
Hide and see: placing and finding an optimal tree for thousands of homoplasy-rich sequences Dietrich Radel 1, Andreas Sand 2,3, and Mie Steel 1, 1 Biomathematics Research Centre, University of Canterbury,
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationRegular networks are determined by their trees
Regular networks are determined by their trees Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu February 17, 2009 Abstract. A rooted acyclic digraph
More informationInferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution
Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods
More informationInferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting
arxiv:1509.06075v3 [q-bio.pe] 12 Feb 2016 Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting Claudia Solís-Lemus 1 and Cécile Ané 1,2 1 Department of Statistics,
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationarxiv: v1 [cs.ds] 1 Nov 2018
An O(nlogn) time Algorithm for computing the Path-length Distance between Trees arxiv:1811.00619v1 [cs.ds] 1 Nov 2018 David Bryant Celine Scornavacca November 5, 2018 Abstract Tree comparison metrics have
More informationPhylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches
Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationFrom Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis. Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science
From Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science!1 Adapted from U.S. Department of Energy Genomic Science
More informationConsensus properties for the deep coalescence problem and their application for scalable tree search
PROCEEDINGS Open Access Consensus properties for the deep coalescence problem and their application for scalable tree search Harris T Lin 1, J Gordon Burleigh 2, Oliver Eulenstein 1* From 7th International
More informationPoint of View. Why Concatenation Fails Near the Anomaly Zone
Point of View Syst. Biol. 67():58 69, 208 The Author(s) 207. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email:
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationWenEtAl-biorxiv 2017/12/21 10:55 page 2 #2
WenEtAl-biorxiv 0// 0: page # Inferring Phylogenetic Networks Using PhyloNet Dingqiao Wen, Yun Yu, Jiafan Zhu, Luay Nakhleh,, Computer Science, Rice University, Houston, TX, USA; BioSciences, Rice University,
More informationSupplementary Information
Supplementary Information For the article"comparable system-level organization of Archaea and ukaryotes" by J. Podani, Z. N. Oltvai, H. Jeong, B. Tombor, A.-L. Barabási, and. Szathmáry (reference numbers
More informationInference of Parsimonious Species Phylogenies from Multi-locus Data
RICE UNIVERSITY Inference of Parsimonious Species Phylogenies from Multi-locus Data by Cuong V. Than A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE Doctor of Philosophy APPROVED,
More informationPhylogenetics: Parsimony
1 Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University he Problem 2 Input: Multiple alignment of a set S of sequences Output: ree leaf-labeled with S Assumptions Characters are mutually independent
More information