NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Size: px
Start display at page:

Download "NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees"

Transcription

1 NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, University of Illinois at Urbana Champaign RECOMB-CG October 11, 2018 Molloy & Warnow NJMerge 1/58

2 Motivation Project to sequence the genomes of 10,000 vertebrate species Molloy & Warnow NJMerge 2/58

3 Motivation Project to sequence the genome of at least one individual from each of the 66,000 vertebrate species Molloy & Warnow NJMerge 3/58

4 Challenges Most accurate methods attempt to solve NP-Hard optimization problems (e.g., maximum likelihood). Solution space (of unrooted binary trees on N leaves) grows exponentially with N. Molloy & Warnow NJMerge 4/58

5 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 5/58

6 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 6/58

7 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 7/58

8 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 8/58

9 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 9/58

10 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 10/58

11 Disk Covering Methods [e.g., Nakhleh et al., 2001; Nelesen et al., 2012; Bayzid et al., 2014] Molloy & Warnow NJMerge 11/58

12 Divide-and-Conquer Strategy using NJMerge Molloy & Warnow NJMerge 12/58

13 Divide-and-Conquer Strategy using NJMerge Molloy & Warnow NJMerge 13/58

14 Merged together = Compatibility Supertree Molloy & Warnow NJMerge 14/58

15 Merged together = Compatibility Supertree Molloy & Warnow NJMerge 15/58

16 Merged together = Compatibility Supertree Molloy & Warnow NJMerge 16/58

17 NJMerge is an extension of Neighbor Joining (NJ). [Saitou and Nei, 1987] NJ runs in O(N 3 ) time. Molloy & Warnow NJMerge 17/58

18 Neighbor Joining [Saitou and Nei, 1987] Start with a set of nodes N corresponding to N leaves. Then Identify a siblinghood or pair of nodes, x and y. Remove nodes x and y from N. Add node (x, y) to N. Repeat until only one node. Molloy & Warnow NJMerge 18/58

19 NJMerge NOTE: Distance matrix is additive for (((A, B), (C, D)), E, (F, (G, H))); Molloy & Warnow NJMerge 19/58

20 NJMerge Before accepting a siblinghood proposal, check whether it violates a constraint tree makes the set of constraint trees incompatible Molloy & Warnow NJMerge 20/58

21 NJMerge NOTE: Distance matrix is additive for (((A, B), (C, D)), E, (F, (G, H))); Molloy & Warnow NJMerge 21/58

22 NJMerge Determining the compatibility of a set of unrooted phylogenetic trees is NP-complete [Steel, 1992; Warnow, 1994]. NJMerge uses a polynomial-time heuristic. Update constraint trees based on siblinghood proposal (x, y). Test compatibility of the constraint trees with leaf (x, y) in polynomial time [Aho et al., 1981]. Not every constraint tree will contain leaf (x, y), so it is possible for NJMerge to fail. Molloy & Warnow NJMerge 22/58

23 Is NJMerge an effective approach in theory? Molloy & Warnow NJMerge 23/58

24 Correctness Guarantee In preparation for journal extension Theorem Given constraint trees that agree with T and a distance matrix that is nearly additive 1 for T, NJMerge returns T. 1 An n n matrix M is nearly additive if each entry M[i, j] differs from the distance between leaf i and leaf j in T by less than one half of the shortest branch length in T. Molloy & Warnow NJMerge 24/58

25 Statistical Consistency In preparation for journal extension Corollary NJMerge can be used in gene tree estimation pipelines that are statistically consistent under the GTR model of sequence evolution. Molloy & Warnow NJMerge 25/58

26 Statistical Consistency In preparation for journal extension Corollary NJMerge can be used in species tree estimation pipelines that are statistically consistent under the Multi-Species Coalescent model. 2 2 Proof assumes both the number of genes and the number of sites per gene goes to infinity. Molloy & Warnow NJMerge 26/58

27 Is NJMerge an effective approach in practice? Molloy & Warnow NJMerge 27/58

28 Species Tree Estimation Molloy & Warnow NJMerge 28/58

29 Gene tree matching the species tree Molloy & Warnow NJMerge 29/58

30 Gene tree differing from the species trees due to Incomplete Lineage Sorting (ILS) Molloy & Warnow NJMerge 30/58

31 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 31/58

32 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 32/58

33 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 33/58

34 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 34/58

35 Species Tree Estimation Pipelines Molloy & Warnow NJMerge 35/58

36 Species Tree Estimation Pipelines using NJMerge Distance Matrix 1) Estimate gene trees using FastTree-2 [Price et al., 2010]. 2) Compute average number of edges between pairs of species [e.g., Liu and Yu, 2011]. Constraint Trees ASTRAL-III Unpartitioned concatenation using RAxML Molloy & Warnow NJMerge 36/58

37 Evaluation Metrics Robinson-Foulds (RF) distance between the true and the estimated species trees Running time All methods limited to a single compute node with 16 cores 64 GB of physical memory 48 hours maximum wall-clock time Molloy & Warnow NJMerge 37/58

38 Running time T = ( T1 Method + + Tk Method ) + T NJMerge where k = number of subsets T Method i = time to compute tree on subset i for given method T NJMerge = time to run NJMerge Molloy & Warnow NJMerge 38/58

39 Case 1: ASTRAL-III [Mirarab et al., 2014; Mirarab and Warnow, 2015; Zhang et al., 2018] Uses dynamic programming to solve Maximum Quartet Consistency problem within a constrained search space Statistically consistent under the Multi-Species Coalescent model (assuming true gene trees) Molloy & Warnow NJMerge 39/58

40 Case 1: ASTRAL-III on 1000-taxon, 1000-gene datasets Moderate ILS Very High ILS Running Time (m) ASTRAL Method NJMerge+ASTRAL NOTE: ASTRAL did not complete within 48 hours on 4/20 datasets with high ILS. Molloy & Warnow NJMerge 40/58

41 Case 1: ASTRAL-III on 1000-taxon, 1000-gene datasets Species Tree Error Moderate ILS ASTRAL Method Very High ILS NJMerge+ASTRAL NOTE: ASTRAL did not complete within 48 hours on 4/20 datasets with high ILS. Molloy & Warnow NJMerge 41/58

42 Case 2: RAxML [Stamatakis, 2014] Uses heuristics to search for tree with best likelihood score Not statistically consistent under the Multi-Species Coalescent model [Roch and Steel, 2015] Molloy & Warnow NJMerge 42/58

43 Case 2: RAxML 100 species, 1000 introns 1000 species, 1000 introns Running Time (m) Moderate Very High Moderate Very High RAxML Level of ILS NJMerge+RAxML NOTE: Bar graphs show all datasets on which at least one method completed. Molloy & Warnow NJMerge 43/58

44 Case 2: RAxML on 1000-taxon, 1000-gene datasets RAxML ran on 0/20 datasets with moderate ILS and 1/20 with high ILS due to Out of Memory errors. For the one dataset on which RAxML could run, RAxML did not complete within 48 hours. In contrast, NJMerge+RAxML completed on 40/40 of these datasets in less than 30 hours (often in less than 17 hours). Molloy & Warnow NJMerge 44/58

45 Case 2: RAxML Species Tree Error species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High RAxML Level of ILS NJMerge+RAxML NOTE: Box plots show all datasets on which at least one method completed. Molloy & Warnow NJMerge 45/58

46 Conclusions NJMerge runs in polynomial time can be used to build statistically consistent divide-and-conquer pipelines enabled ASTRAL-III and RAxML to analyze 1000-taxon datasets using a single compute node with 16 cores 64 GB of physical memory 48 hours maximum wall-clock time never failed on 1000-taxon datasets; failed on less than 1% of all datasets tested Molloy & Warnow NJMerge 46/58

47 Funding This work was supported by the U.S. National Science Foundation Graduate Research Fellowship Program (DGE ) Blue Waters Sustained-Petascale Computing Project (OCI and ACI ) Graph-Theoretic Algorithms to Improve Phylogenomic Analyses (CCF ) and the Ira & Debra Cohen Graduate Fellowship in Computer Science. Molloy & Warnow NJMerge 47/58

48 Which species tree method should I use? on 1000-taxon, 1000-gene datasets Species Tree Error Moderate ILS Very High ILS Method NJMerge+ASTRAL NJMerge+RAxML NJMerge+SVDquartets NJst Molloy & Warnow NJMerge 48/58

49 How does the estimated distance matrix impact NJMerge? 100 taxa, Moderate ILS 100 taxa, Very High ILS 0.30 Species Tree Error Number of Genes NJst NJMerge+True Molloy & Warnow NJMerge 49/58

50 References Nakhleh et al. (2001). Designing fast converging phylogenetic methods. Bioinformatics 17(Suppl. 1):S190 S198. Nelesen et al. (2012). DACTAL: Divide-And-Conquer Trees (almost) without Alignments. Bioinformatics 28(12):i274 i282. Bayzid et al. (2014). Disk Covering Methods Improve Phylogenomic Analyses. BMC Genomics15(Suppl 6): S7. Jiang et al. (2001). A polynomial time approximation scheme for inferring evolutationay trees from quartet topologies and its application. SIAM Journal on Computing 30(6): Molloy & Warnow NJMerge 50/58

51 References Bansal et al. (2010). Robinson-Foulds Supertrees. Algorithms for Molecular Biology 5(1). Saitou and Nei. (1987). The neighbor-joining method: a new method for reconstruction of phylogenetic trees. Molecular Biology and Evolution 4: Steel. (1992). The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification 9: Warnow (1994). Tree compatibility and inferring evolutionary history. Journal of Algorithms, 16(3): Molloy & Warnow NJMerge 51/58

52 References Agho et al. (1981). Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions. SIAM Journal on Computing, 10(3): Atteson. (1999). The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction. Algorithmica, 25(2-3): Price et al. (2010). FastTree 2 Approximately Maximum Likelihood Trees for Large Alignments. PLoS ONE 5(3):1 10. Liu and Yu. (2011). Estimating Species Trees from Unrooted Gene Trees. Systematic Biology, 60(5): Molloy & Warnow NJMerge 52/58

53 References Mirarab et al. (2015). PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences. Journal of Computational Biology 22(5): Mirarab et al. (2014) ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics, 30(17):i541 i548. Mirarab and Warnow. (2015). ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12):i44 i52. Zhang et al. (2017). ASTRAL-III: Increased Scalability and Impacts of Contracting Low Support Branches. In Meidanis and Nakhleh, editors, Comparative Genomics. RECOMB- CG Lecture Notes in Computer Science, volume Springer, Cham. Molloy & Warnow NJMerge 53/58

54 References Stamatakis. (2014). RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics, 30(9). Roch and Steel. (2015). Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theoretical Population Biology 100C: Chifman and Kubatko. (2014). Quartet Inference from SNP Data Under the Coalescent Model. Bioinformatics, 30(23): Molloy & Warnow NJMerge 54/58

55 Correctness Guarantee In preparation for journal extension Proof. NJ applied to a nearly additive distance matrix for T will return T [Atteson, 1999]. Because all trees in T agree with T, the siblinghood proposals indicated by NJ will never violate a constraint tree, so NJMerge will simply operate as traditional NJ and return T. Molloy & Warnow NJMerge 55/58

56 Species Tree Estimation Pipelines using NJMerge Molloy & Warnow NJMerge 56/58

57 Simulated Datasets Dataset Size 100 and 1000 species 1000 genes varying in length from 300 to 1500 bp Level of Incomplete Lineage Sorting (ILS) Moderate: 8-10% AD Very high: 68-69% AD where AD is the Average (RF) Distance between the true species tree and the tree gene trees. Molloy & Warnow NJMerge 57/58

58 ASTRAL [Mirarab et al., 2014; Mirarab and Warnow, 2015; Zhang et al., 2018] Uses dynamic programming to solve the Maximum Quartet Consistency Problem problem within a constrained search space X Runs in O(nm X ) n is the number of species m is the number of genes For ASTRAL-III X = O(nm) Runs in O((nm) ) Molloy & Warnow NJMerge 58/58

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees Concatenation Analyses in the Presence of Incomplete Lineage Sorting May 22, 2015 Tree of Life Tandy Warnow Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting.. 2015 May 22.

More information

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Today Explain the course Introduce some of the research in this area Describe some open problems Talk about

More information

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Course Staff Professor Tandy Warnow Office hours Tuesdays after class (2-3 PM) in Siebel 3235 Email address:

More information

Phylogenetic Geometry

Phylogenetic Geometry Phylogenetic Geometry Ruth Davidson University of Illinois Urbana-Champaign Department of Mathematics Mathematics and Statistics Seminar Washington State University-Vancouver September 26, 2016 Phylogenies

More information

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign Phylogenomics, Multiple Sequence Alignment, and Metagenomics Tandy Warnow University of Illinois at Urbana-Champaign Phylogeny (evolutionary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the

More information

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012 CS 394C Algorithms for Computational Biology Tandy Warnow Spring 2012 Biology: 21st Century Science! When the human genome was sequenced seven years ago, scientists knew that most of the major scientific

More information

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support Siavash Mirarab University of California, San Diego Joint work with Tandy Warnow Erfan Sayyari

More information

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu

More information

Jed Chou. April 13, 2015

Jed Chou. April 13, 2015 of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene

More information

TheDisk-Covering MethodforTree Reconstruction

TheDisk-Covering MethodforTree Reconstruction TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document

More information

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

More information

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore Non-binary Tree Reconciliation Louxin Zhang Department of Mathematics National University of Singapore matzlx@nus.edu.sg Introduction: Gene Duplication Inference Consider a duplication gene family G Species

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

On the variance of internode distance under the multispecies coalescent

On the variance of internode distance under the multispecies coalescent On the variance of internode distance under the multispecies coalescent Sébastien Roch 1[0000 000 7608 8550 Department of Mathematics University of Wisconsin Madison Madison, WI 53706 roch@math.wisc.edu

More information

Reconstruction of species trees from gene trees using ASTRAL. Siavash Mirarab University of California, San Diego (ECE)

Reconstruction of species trees from gene trees using ASTRAL. Siavash Mirarab University of California, San Diego (ECE) Reconstruction of species trees from gene trees using ASTRAL Siavash Mirarab University of California, San Diego (ECE) Phylogenomics Orangutan Chimpanzee gene 1 gene 2 gene 999 gene 1000 Gorilla Human

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego Upcoming challenges in phylogenomics Siavash Mirarab University of California, San Diego Gene tree discordance The species tree gene1000 Causes of gene tree discordance include: Incomplete Lineage Sorting

More information

Fast coalescent-based branch support using local quartet frequencies

Fast coalescent-based branch support using local quartet frequencies Fast coalescent-based branch support using local quartet frequencies Molecular Biology and Evolution (2016) 33 (7): 1654 68 Erfan Sayyari, Siavash Mirarab University of California, San Diego (ECE) anzee

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Species Tree Inference using SVDquartets

Species Tree Inference using SVDquartets Species Tree Inference using SVDquartets Laura Kubatko and Dave Swofford May 19, 2015 Laura Kubatko SVDquartets May 19, 2015 1 / 11 SVDquartets In this tutorial, we ll discuss several different data types:

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1 Smith et al. American Journal of Botany 98(3):404-414. 2011. Data Supplement S1 page 1 Smith, Stephen A., Jeremy M. Beaulieu, Alexandros Stamatakis, and Michael J. Donoghue. 2011. Understanding angiosperm

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

BIOINFORMATICS. Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data. Jijun Tang 1 and Bernard M.E. Moret 1

BIOINFORMATICS. Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data. Jijun Tang 1 and Bernard M.E. Moret 1 BIOINFORMATICS Vol. 1 no. 1 2003 Pages 1 8 Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data Jijun Tang 1 and Bernard M.E. Moret 1 1 Department of Computer Science, University of New

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

From Gene Trees to Species Trees. Tandy Warnow The University of Texas at Aus<n

From Gene Trees to Species Trees. Tandy Warnow The University of Texas at Aus<n From Gene Trees to Species Trees Tandy Warnow The University of Texas at Aus

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci

Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-2010 Incomplete Lineage Sorting: Consistent Phylogeny Estimation From Multiple Loci Elchanan Mossel University of

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Mul$ple Sequence Alignment Methods Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Species Tree Orangutan Gorilla Chimpanzee Human From the Tree of the Life

More information

arxiv: v1 [q-bio.pe] 3 May 2016

arxiv: v1 [q-bio.pe] 3 May 2016 PHYLOGENETIC TREES AND EUCLIDEAN EMBEDDINGS MARK LAYER AND JOHN A. RHODES arxiv:1605.01039v1 [q-bio.pe] 3 May 2016 Abstract. It was recently observed by de Vienne et al. that a simple square root transformation

More information

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 5 CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

The impact of missing data on species tree estimation

The impact of missing data on species tree estimation MBE Advance Access published November 2, 215 The impact of missing data on species tree estimation Zhenxiang Xi, 1 Liang Liu, 2,3 and Charles C. Davis*,1 1 Department of Organismic and Evolutionary Biology,

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Weighted Quartets Phylogenetics

Weighted Quartets Phylogenetics Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087 Problem: quartet-based supertree Input Output A B C D A C D E

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets Xiaofan Zhou, 1,2 Xing-Xing Shen, 3 Chris Todd Hittinger, 4 and Antonis Rokas*,3 1 Integrative Microbiology

More information

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas Phylogeny (evolutionary tree) Orangutan From the Tree of the Life Website, University of Arizona Gorilla

More information

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas SEPP and TIPP for metagenomic analysis Tandy Warnow Department of Computer Science University of Texas Metagenomics: Venter et al., Exploring the Sargasso Sea: Scientists Discover One Million New Genes

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 6, Numbers 3/4, 1999 Mary Ann Liebert, Inc. Pp. 369 386 Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction DANIEL H. HUSON, 1 SCOTT M.

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,

More information

Workshop III: Evolutionary Genomics

Workshop III: Evolutionary Genomics Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint

More information

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :

More information

A new algorithm to construct phylogenetic networks from trees

A new algorithm to construct phylogenetic networks from trees A new algorithm to construct phylogenetic networks from trees J. Wang College of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia, China Corresponding author: J. Wang E-mail: wangjuanangle@hit.edu.cn

More information

Quartet Inference from SNP Data Under the Coalescent Model

Quartet Inference from SNP Data Under the Coalescent Model Bioinformatics Advance Access published August 7, 2014 Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman 1 and Laura Kubatko 2,3 1 Department of Cancer Biology, Wake Forest School

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics in the Age of Genomics: Prospects and Challenges Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Distances that Perfectly Mislead

Distances that Perfectly Mislead Syst. Biol. 53(2):327 332, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490423809 Distances that Perfectly Mislead DANIEL H. HUSON 1 AND

More information

An Investigation of Phylogenetic Likelihood Methods

An Investigation of Phylogenetic Likelihood Methods An Investigation of Phylogenetic Likelihood Methods Tiffani L. Williams and Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131-1386 Email: tlw,moret @cs.unm.edu

More information

Inferring a level-1 phylogenetic network from a dense set of rooted triplets

Inferring a level-1 phylogenetic network from a dense set of rooted triplets Theoretical Computer Science 363 (2006) 60 68 www.elsevier.com/locate/tcs Inferring a level-1 phylogenetic network from a dense set of rooted triplets Jesper Jansson a,, Wing-Kin Sung a,b, a School of

More information

Phylogeny Tree Algorithms

Phylogeny Tree Algorithms Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

More information

arxiv: v2 [q-bio.pe] 4 Feb 2016

arxiv: v2 [q-bio.pe] 4 Feb 2016 Annals of the Institute of Statistical Mathematics manuscript No. (will be inserted by the editor) Distributions of topological tree metrics between a species tree and a gene tree Jing Xi Jin Xie Ruriko

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Isolating - A New Resampling Method for Gene Order Data

Isolating - A New Resampling Method for Gene Order Data Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence

More information

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions PLGW05 Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 1 joint work with Ilan Gronau 2, Shlomo Moran 3, and Irad Yavneh 3 1 2 Dept. of Biological Statistics and Computational

More information

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Li-San Wang Robert K. Jansen Dept. of Computer Sciences Section of Integrative Biology University of Texas, Austin,

More information

Properties of Consensus Methods for Inferring Species Trees from Gene Trees

Properties of Consensus Methods for Inferring Species Trees from Gene Trees Syst. Biol. 58(1):35 54, 2009 Copyright c Society of Systematic Biologists DOI:10.1093/sysbio/syp008 Properties of Consensus Methods for Inferring Species Trees from Gene Trees JAMES H. DEGNAN 1,4,,MICHAEL

More information

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW

In comparisons of genomic sequences from multiple species, Challenges in Species Tree Estimation Under the Multispecies Coalescent Model REVIEW REVIEW Challenges in Species Tree Estimation Under the Multispecies Coalescent Model Bo Xu* and Ziheng Yang*,,1 *Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China and Department

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel

Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel SUPERTREE ALGORITHMS FOR ANCESTRAL DIVERGENCE DATES AND NESTED TAXA Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel Department of Mathematics and Statistics University of Canterbury

More information

Theory of Evolution. Charles Darwin

Theory of Evolution. Charles Darwin Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

A Minimum Spanning Tree Framework for Inferring Phylogenies

A Minimum Spanning Tree Framework for Inferring Phylogenies A Minimum Spanning Tree Framework for Inferring Phylogenies Daniel Giannico Adkins Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-010-157

More information

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

More information

BIOINFORMATICS DISCOVERY NOTE

BIOINFORMATICS DISCOVERY NOTE BIOINFORMATICS DISCOVERY NOTE Designing Fast Converging Phylogenetic Methods!" #%$&('$*),+"-%./ 0/132-%$ 0*)543768$'9;:(0'=A@B2$0*)A@B'9;9CD

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS KT Huber, V Moulton, C Semple, and M Steel Department of Mathematics and Statistics University of Canterbury Private Bag 4800 Christchurch,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

arxiv: v1 [q-bio.pe] 1 Jun 2014

arxiv: v1 [q-bio.pe] 1 Jun 2014 THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from

More information

arxiv: v1 [cs.cc] 9 Oct 2014

arxiv: v1 [cs.cc] 9 Oct 2014 Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz May 7, 08 arxiv:40.7v [cs.cc] 9 Oct 04 Abstract A ternary

More information

A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data

A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data Mary E. Cosner Dept. of Plant Biology Ohio State University Li-San Wang Dept.

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

arxiv: v1 [q-bio.pe] 6 Jun 2013

arxiv: v1 [q-bio.pe] 6 Jun 2013 Hide and see: placing and finding an optimal tree for thousands of homoplasy-rich sequences Dietrich Radel 1, Andreas Sand 2,3, and Mie Steel 1, 1 Biomathematics Research Centre, University of Canterbury,

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Regular networks are determined by their trees

Regular networks are determined by their trees Regular networks are determined by their trees Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu February 17, 2009 Abstract. A rooted acyclic digraph

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting arxiv:1509.06075v3 [q-bio.pe] 12 Feb 2016 Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting Claudia Solís-Lemus 1 and Cécile Ané 1,2 1 Department of Statistics,

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

arxiv: v1 [cs.ds] 1 Nov 2018

arxiv: v1 [cs.ds] 1 Nov 2018 An O(nlogn) time Algorithm for computing the Path-length Distance between Trees arxiv:1811.00619v1 [cs.ds] 1 Nov 2018 David Bryant Celine Scornavacca November 5, 2018 Abstract Tree comparison metrics have

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

From Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis. Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science

From Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis. Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science From Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science!1 Adapted from U.S. Department of Energy Genomic Science

More information

Consensus properties for the deep coalescence problem and their application for scalable tree search

Consensus properties for the deep coalescence problem and their application for scalable tree search PROCEEDINGS Open Access Consensus properties for the deep coalescence problem and their application for scalable tree search Harris T Lin 1, J Gordon Burleigh 2, Oliver Eulenstein 1* From 7th International

More information

Point of View. Why Concatenation Fails Near the Anomaly Zone

Point of View. Why Concatenation Fails Near the Anomaly Zone Point of View Syst. Biol. 67():58 69, 208 The Author(s) 207. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email:

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2 WenEtAl-biorxiv 0// 0: page # Inferring Phylogenetic Networks Using PhyloNet Dingqiao Wen, Yun Yu, Jiafan Zhu, Luay Nakhleh,, Computer Science, Rice University, Houston, TX, USA; BioSciences, Rice University,

More information

Supplementary Information

Supplementary Information Supplementary Information For the article"comparable system-level organization of Archaea and ukaryotes" by J. Podani, Z. N. Oltvai, H. Jeong, B. Tombor, A.-L. Barabási, and. Szathmáry (reference numbers

More information

Inference of Parsimonious Species Phylogenies from Multi-locus Data

Inference of Parsimonious Species Phylogenies from Multi-locus Data RICE UNIVERSITY Inference of Parsimonious Species Phylogenies from Multi-locus Data by Cuong V. Than A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE Doctor of Philosophy APPROVED,

More information

Phylogenetics: Parsimony

Phylogenetics: Parsimony 1 Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University he Problem 2 Input: Multiple alignment of a set S of sequences Output: ree leaf-labeled with S Assumptions Characters are mutually independent

More information