Recent Advances in Phylogeny Reconstruction

Size: px
Start display at page:

Download "Recent Advances in Phylogeny Reconstruction"

Transcription

1 Recent Advances in Phylogeny Reconstruction from Gene-Order Data Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM Department Colloqium p.1/41

2 Collaborators and Support Collaborators: University of Texas, Austin: Tandy Warnow (Computer Science) David Hillis, Robert Jansen, Randy Linder (Biology) University of New Mexico: David Bader (Electrical & Comp. Eng.) Funding: National Science Foundation, at UNM: 6 grants for $2 million over 5 years with UT Austin: 10 grants for $8 million Department Colloqium p.2/41

3 Overview Phylogenies Department Colloqium p.3/41

4 Overview Phylogenies Gene-order data: mitochondrion and chloroplast genomes Department Colloqium p.3/41

5 Overview Phylogenies Gene-order data: mitochondrion and chloroplast genomes Inversion and other genomic distance measures Department Colloqium p.3/41

6 Overview Phylogenies Gene-order data: mitochondrion and chloroplast genomes Inversion and other genomic distance measures Estimating the true evolutionary distance Department Colloqium p.3/41

7 Overview Phylogenies Gene-order data: mitochondrion and chloroplast genomes Inversion and other genomic distance measures Estimating the true evolutionary distance Fast convergence for reconstruction methods Department Colloqium p.3/41

8 Overview Phylogenies Gene-order data: mitochondrion and chloroplast genomes Inversion and other genomic distance measures Estimating the true evolutionary distance Fast convergence for reconstruction methods GRAPPA news Department Colloqium p.3/41

9 Phylogenies A phylogeny is a reconstruction of the evolutionary history of a collection of organisms; it usually takes the form of a tree. Modern organisms are placed at the leaves and ancestral organisms occupy internal nodes. The edges of the tree denote evolutionary relationships. Department Colloqium p.4/41

10 12 Species of Campanulaceae Wahlenbergia Merciera Trachelium Symphyandra Campanula Adenophora Legousia Asyneuma Triodanus Codonopsis Cyananthus Platycodon 2.25 Tobacco Department Colloqium p.5/41

11 Herpes Viruses that Affect Humans HVS EHV2 KHSV EBV HSV1 HSV2 PRV EHV1 HHV6 VZV HHV7 HCMV Department Colloqium p.6/41

12 A Large Phylogeny: 500 Green Plants Department Colloqium p.7/41

13 Reconstructing Phylogenies Reconstructing phylogenies is a major component of modern research programs in many areas of biology and medicine: pharmaceutical research for drug discovery (most famous is herbicide Roundup TM ) Department Colloqium p.8/41

14 Reconstructing Phylogenies Reconstructing phylogenies is a major component of modern research programs in many areas of biology and medicine: pharmaceutical research for drug discovery (most famous is herbicide Roundup TM ) understanding rapidly mutating viruses (HIV) Department Colloqium p.8/41

15 Reconstructing Phylogenies Reconstructing phylogenies is a major component of modern research programs in many areas of biology and medicine: pharmaceutical research for drug discovery (most famous is herbicide Roundup TM ) understanding rapidly mutating viruses (HIV) designing enhanced organisms (rice, wheat) Department Colloqium p.8/41

16 Reconstructing Phylogenies Reconstructing phylogenies is a major component of modern research programs in many areas of biology and medicine: pharmaceutical research for drug discovery (most famous is herbicide Roundup TM ) understanding rapidly mutating viruses (HIV) designing enhanced organisms (rice, wheat) explaining and predicting gene expression Department Colloqium p.8/41

17 Reconstructing Phylogenies Reconstructing phylogenies is a major component of modern research programs in many areas of biology and medicine: pharmaceutical research for drug discovery (most famous is herbicide Roundup TM ) understanding rapidly mutating viruses (HIV) designing enhanced organisms (rice, wheat) explaining and predicting gene expression explaining and predicting ligands Department Colloqium p.8/41

18 Reconstructing Phylogenies Reconstructing phylogenies is a major component of modern research programs in many areas of biology and medicine: pharmaceutical research for drug discovery (most famous is herbicide Roundup TM ) understanding rapidly mutating viruses (HIV) designing enhanced organisms (rice, wheat) explaining and predicting gene expression explaining and predicting ligands most centrally, understanding genomic evolution Department Colloqium p.8/41

19 Reconstructing Phylogenies (cont d) Requires a model of tree evolution (e.g., random or birth-death) Requires a model of DNA/RNA/codon/gene order/etc. evolution (e.g., Markov models with weights matrices such as Jukes-Cantor and Kimura) Requires an optimization criterion that relates to the previous two models (e.g., likelihood or parsimony) Requires data with sufficient signal (to recover defining information) Department Colloqium p.9/41

20 Computational Phylogenetics Is extremely computation-intensive. Is viewed very differently by biologists (one dataset only, accuracy first) and by computer scientists (efficiency first) Department Colloqium p.10/41

21 Computational Phylogenetics Is extremely computation-intensive. Is viewed very differently by biologists (one dataset only, accuracy first) and by computer scientists (efficiency first) Sequence data (RNA, DNA, and aminoacid) has been used for over 20 years and is fairly well understood, but methods do not scale up. Genomic data (gene order and content of whole genomes) provides new information, but is much harder to analyze than sequence data. Department Colloqium p.10/41

22 Gene-Order Data Certain genomes evolve mostly through rearrangement of the order of genes, with occasional gene duplication or gene loss. A chloroplast is a semi-independent organism that lives within plant cells and allows them to photosynthesize. Chloroplasts have one circular chromosome with 120 genes. A mitochondrion is a semi-independent organism that lives within animal and some plant cells and supplies them with energy. Mitochondria have one circular chromosome with 40 genes in animals, more in plants. Department Colloqium p.11/41

23 Mitochondria Homo sapiens Felis catus Lumbricus terrestris Saccharomyces cerevisiae Department Colloqium p.12/41

24 Chloroplasts Cyanidium caldarium Zea mays Department Colloqium p.13/41

25 Phylogenies from Gene-Order Data Optimization target: reconstruct the phylogeny with the least total number of genomic changes. An application of Occam s razor; biologists call this the principle of parsimony. Department Colloqium p.14/41

26 True Evolutionary Distances True Evolutionary Distance (T.E.D.): actual number of events along an edge of the tree. Edit Distance: minimum number of events from one end of a tree edge to the other. We obtain better topological accuracy with T.E.D.s than with Edit Distances. T.E.D. can only be estimated. Department Colloqium p.15/41

27 True Evolutionary Distance A B D C Polynomial Time A B C D A B C D The tree and, a fortiori, its edge lengths are not known. Department Colloqium p.16/41

28 Rearrangement Events Transposition Inversion Inverted Transposition Department Colloqium p.17/41

29 Generalized Nadeau-Taylor Model Inversions, Transpositions, and Inverted Transpositions All events of the same type are equiprobable Assign probabilities to different event types: Transposition: α Inverted Transposition: β Inversion: 1 α β Department Colloqium p.18/41

30 Breakpoint Distance D BP (G, G ) = No. of breakpoints in G w.r.t G G=( ) G =( ) Department Colloqium p.19/41

31 Genomic Distances BP: Breakpoint distance INV [Moret, Bader, Yan WADS 2001]: Minimum number of inversions required to transform one genome to another, IEBP [Wang, Warnow STOC 01]: Approximate the expected breakpoint distance with provable error. Exact IEBP [Wang WABI 01]: Invert the expected breakpoint distance EDE [Moret, Wang, Warnow, Wyman ISMB 01]: Estimate the expected inversion distance using simulation data. Department Colloqium p.20/41

32 Exact IEBP: Basic Idea Let G 0 be the starting genome and G k be the genome after k events. For every k > 0 compute E[D BP (G k, G 0 )], the expected number of breakpoints after k events. Return k that minimizes E[D BP (G k, G 0 )] D BP (G, G ). Department Colloqium p.21/41

33 The Counting Lemma ι n (u, v) = τ n (u, v) = ν n (u, v) = min{ u 1, v 1, n + 1 u, n + 1 v } (if uv < 0) 0 ( u 1 2 ) + ( n+1 u 2 ) (if u v, uv > 0) (if u = v) 0 (if uv < 0) (min{ u, v } 1)(n + 1 max{ u, v }) ( (if u v, uv > 0) n+1 u ) ( 3 + u 1 ) 3 (n 2)ι n (u, v) τ n (u, v) 3τ n (u, v) (if u = v) (if uv < 0) (if u v, uv > 0) (if u = v) Department Colloqium p.22/41

34 Goodness of Fit of Distance Estimators Inversion only on 120 genes Actual number of events Actual number of events Inversion Distance Inversion distance Breakpoint Distance Breakpoint distance 300 Actual number of events Actual number of events Exact IEBP Distance Exact-IEBP distance Measured Distance Ideal estimator Department Colloqium p.23/41

35 Goodness of Fit of Distance Estimators Inversion only on 120 genes Actual number of events Actual number of events IEBP Distance IEBP distance EDE Distance EDE distance 300 Actual number of events Actual number of events Exact IEBP Distance Exact-IEBP distance Measured Distance Ideal estimator Department Colloqium p.24/41

36 Absolute Error of Distance Estimators Absolute difference BP INV IEBP EDE Exact IEBP Actual number of events Inversion only Department Colloqium p.25/41

37 Absolute Error of Distance Estimators Absolute difference BP INV IEBP EDE Exact IEBP Actual number of events Transpositions only Department Colloqium p.26/41

38 Absolute Error of Distance Estimators Absolute difference BP INV IEBP EDE Exact IEBP Actual number of events All three classes equiprobable Department Colloqium p.27/41

39 Accuracy of Neighbor Joining 120 genes, inversion only, 10/20/40/80/160 genomes False Negative Rate (%) NJ(BP) NJ(INV) NJ(IEBP) NJ(EDE) NJ(Exact IEBP) Normalized Maximum Pairwise Inversion Distance Department Colloqium p.28/41

40 Accuracy of Neighbor Joining 120 genes, equiprobable events, 10/20/40/80/160 genomes False Negative Rate (%) NJ(BP) NJ(INV) NJ(IEBP) NJ(EDE) NJ(Exact IEBP) Normalized Maximum Pairwise Inversion Distance Department Colloqium p.29/41

41 Robustness of Exact-IEBP 120 genes, inversion only, 10/20/40/80/160 genomes NJ(Exact IEBP(0,0)) NJ(Exact IEBP(1,0)) NJ(Exact IEBP(1/3,1/3)) False Negative Rate (%) Normalized Maximum Pairwise Inversion Distance Department Colloqium p.30/41

42 Robustness of Exact-IEBP 120 genes, equiprobable events, 10/20/40/80/160 genomes NJ(Exact IEBP(0,0)) NJ(Exact IEBP(1,0)) NJ(Exact IEBP(1/3,1/3)) False Negative Rate (%) Normalized Maximum Pairwise Inversion Distance Department Colloqium p.31/41

43 Convergence Rate A method is statistically consistent for a given model if, given long enough data sequences, it recovers the true tree with high probability. Department Colloqium p.32/41

44 Convergence Rate A method is statistically consistent for a given model if, given long enough data sequences, it recovers the true tree with high probability. Problem: long enough" sequences may not exist in nature. Department Colloqium p.32/41

45 Convergence Rate A method is statistically consistent for a given model if, given long enough data sequences, it recovers the true tree with high probability. Problem: long enough" sequences may not exist in nature. Solution: a method is fast-converging for a given model if, given sequences of polynomial length, it recovers the true tree with high probability. Department Colloqium p.32/41

46 Convergence Rate A method is statistically consistent for a given model if, given long enough data sequences, it recovers the true tree with high probability. Problem: long enough" sequences may not exist in nature. Solution: a method is fast-converging for a given model if, given sequences of polynomial length, it recovers the true tree with high probability. Problem: the model conditions may not hold. Department Colloqium p.32/41

47 Convergence Rate A method is statistically consistent for a given model if, given long enough data sequences, it recovers the true tree with high probability. Problem: long enough" sequences may not exist in nature. Solution: a method is fast-converging for a given model if, given sequences of polynomial length, it recovers the true tree with high probability. Problem: the model conditions may not hold. Solution: a method is absolute fast-converging if, given sequences of polynomial length, it recovers the true tree with high probability. Department Colloqium p.32/41

48 Known Fast-Converging Methods The short-quartet methods [Warnow et al.]: absolute fast-converging The disk-covering methods (DCM) [Warnow et al.]: absolute fast-converging The harmonic greedy triplet method [Kao et al.] The method of Cryan, Goldberg, and Golbderg DCM-boosted neighbor-joining [Warnow et al.] Department Colloqium p.33/41

49 New Results [Warnow, Moret, St. John SODA 01] New absolute fast-converging method: weighted witness-antiwitness method (WIGWAM) Decision procedure to turn fast-converging methods into absolute fast-converging methods: short-quartet support (SQS) Boosting method (DCM plus SQS) to turn many methods with exponential convergence (e.g., neighbor-joining) into absolute fast-converging ones Generalizations to families of boosting methods with same properties, but experimental behavior Department Colloqium p.34/41

50 What is a Quartet? A quartet is an unrooted binary tree on four taxa the smallest tree that induces a nontrivial bipartition. b a {ab cd} d c c a {ac bd} d b d a {ad bc} A quartet {ab cd} agrees with a tree T if the subtree induced in T by the four taxa is the quartet itself. c b Department Colloqium p.35/41

51 Fast Convergence: Decision Problem TRUE TREE SELECTION PROBLEM: Input: A set S of sequences over A, C, T, G generated on an unknown tree (T, M), and a collection T = {T 1, T 2,..., T p } of phylogenies on S. Output: The true tree T if T is in T Department Colloqium p.36/41

52 Quartet Support Let T be a fixed tree leaf-labelled by the set S Let Q a fixed set of quartets on S Let D be the distance matrix on S The support of T with respect to Q is max{l (q Q and diam D (q) l) = q Q(T )} Department Colloqium p.37/41

53 Short Quartet Support PROCEDURE SQS(T, S) For each set of four taxa from S, compute the neighbor-joining quartet q; let Q be the set of all such quartets. Return T i such that s(t i, Q) is maximum; if more than one such tree exists, return the one with the smallest index i. Department Colloqium p.38/41

54 SQS Theorem For all ε > 0, there is a polynomial p such that, for all (T, M) in the model on set S of n sequences generated at random on T with length at least p(n), we have whenever T is in T. P r[sqs(t, S) = T ] > 1 ε Department Colloqium p.39/41

55 GRAPPA News: More Speed! Current release (1.03) runs from 2,000 to 10,000 times faster than the original tool, while also giving more capabilities. Department Colloqium p.40/41

56 GRAPPA News: More Speed! Current release (1.03) runs from 2,000 to 10,000 times faster than the original tool, while also giving more capabilities. Research version (1.1) runs from 10,000 to 500,000 times faster than the original tool, thanks to much better bounding. Department Colloqium p.40/41

57 GRAPPA News: More Speed! Current release (1.03) runs from 2,000 to 10,000 times faster than the original tool, while also giving more capabilities. Research version (1.1) runs from 10,000 to 500,000 times faster than the original tool, thanks to much better bounding. The 13-genome Campanulaceae now takes a few hours on a laptop instead of a few centuries on a large workstation. Department Colloqium p.40/41

58 GRAPPA News: More Speed! Current release (1.03) runs from 2,000 to 10,000 times faster than the original tool, while also giving more capabilities. Research version (1.1) runs from 10,000 to 500,000 times faster than the original tool, thanks to much better bounding. The 13-genome Campanulaceae now takes a few hours on a laptop instead of a few centuries on a large workstation. Speedup on Los Lobos is over 200,000,000! Department Colloqium p.40/41

59 Other Recent Results New sequence encodings for gene orders to enable classical parsimony searches. Department Colloqium p.41/41

60 Other Recent Results New sequence encodings for gene orders to enable classical parsimony searches. Combinations of fast-converging boosters with new encodings (i.e., use a new encoding and run a DCM+SQS booster on a classical parsimony optimizer): best accuracy to date. Department Colloqium p.41/41

61 Other Recent Results New sequence encodings for gene orders to enable classical parsimony searches. Combinations of fast-converging boosters with new encodings (i.e., use a new encoding and run a DCM+SQS booster on a classical parsimony optimizer): best accuracy to date. Combinations of fast-converging boosters with new encodings and fast heuristics (e.g., neighbor-joining): best speed/accuracy tradeoff to date. Department Colloqium p.41/41

62 Other Recent Results New sequence encodings for gene orders to enable classical parsimony searches. Combinations of fast-converging boosters with new encodings (i.e., use a new encoding and run a DCM+SQS booster on a classical parsimony optimizer): best accuracy to date. Combinations of fast-converging boosters with new encodings and fast heuristics (e.g., neighbor-joining): best speed/accuracy tradeoff to date. New results on computing inversion distances, inversion medians, etc. Department Colloqium p.41/41

A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data

A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data Mary E. Cosner Dept. of Plant Biology Ohio State University Li-San Wang Dept.

More information

Phylogenetic Reconstruction

Phylogenetic Reconstruction Phylogenetic Reconstruction from Gene-Order Data Bernard M.E. Moret compbio.unm.edu Department of Computer Science University of New Mexico p. 1/71 Acknowledgments Close Collaborators: at UNM: David Bader

More information

Phylogenetic Reconstruction from Gene-Order Data

Phylogenetic Reconstruction from Gene-Order Data p.1/7 Phylogenetic Reconstruction from Gene-Order Data Bernard M.E. Moret compbio.unm.edu Department of Computer Science University of New Mexico p.2/7 Acknowledgments Close Collaborators: at UNM: David

More information

Improving Tree Search in Phylogenetic Reconstruction from Genome Rearrangement Data

Improving Tree Search in Phylogenetic Reconstruction from Genome Rearrangement Data Improving Tree Search in Phylogenetic Reconstruction from Genome Rearrangement Data Fei Ye 1,YanGuo, Andrew Lawson 1, and Jijun Tang, 1 Department of Epidemiology and Biostatistics University of South

More information

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Li-San Wang Robert K. Jansen Dept. of Computer Sciences Section of Integrative Biology University of Texas, Austin,

More information

BIOINFORMATICS. New approaches for reconstructing phylogenies from gene order data. Bernard M.E. Moret, Li-San Wang, Tandy Warnow and Stacia K.

BIOINFORMATICS. New approaches for reconstructing phylogenies from gene order data. Bernard M.E. Moret, Li-San Wang, Tandy Warnow and Stacia K. BIOINFORMATICS Vol. 17 Suppl. 1 21 Pages S165 S173 New approaches for reconstructing phylogenies from gene order data Bernard M.E. Moret, Li-San Wang, Tandy Warnow and Stacia K. Wyman Department of Computer

More information

Steps Toward Accurate Reconstructions of Phylogenies from Gene-Order Data 1

Steps Toward Accurate Reconstructions of Phylogenies from Gene-Order Data 1 Steps Toward Accurate Reconstructions of Phylogenies from Gene-Order Data Bernard M.E. Moret, Jijun Tang, Li-San Wang, and Tandy Warnow Department of Computer Science, University of New Mexico Albuquerque,

More information

New Approaches for Reconstructing Phylogenies from Gene Order Data

New Approaches for Reconstructing Phylogenies from Gene Order Data New Approaches for Reconstructing Phylogenies from Gene Order Data Bernard M.E. Moret Li-San Wang Tandy Warnow Stacia K. Wyman Abstract We report on new techniques we have developed for reconstructing

More information

Mathematics of Evolution and Phylogeny. Edited by Olivier Gascuel

Mathematics of Evolution and Phylogeny. Edited by Olivier Gascuel Mathematics of Evolution and Phylogeny Edited by Olivier Gascuel CLARENDON PRESS. OXFORD 2004 iv CONTENTS 12 Reconstructing Phylogenies from Gene-Content and Gene-Order Data 1 12.1 Introduction: Phylogenies

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Advances in Phylogeny Reconstruction from Gene Order and Content Data

Advances in Phylogeny Reconstruction from Gene Order and Content Data Advances in Phylogeny Reconstruction from Gene Order and Content Data Bernard M.E. Moret Department of Computer Science, University of New Mexico, Albuquerque NM 87131 Tandy Warnow Department of Computer

More information

TheDisk-Covering MethodforTree Reconstruction

TheDisk-Covering MethodforTree Reconstruction TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document

More information

BIOINFORMATICS. Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data. Jijun Tang 1 and Bernard M.E. Moret 1

BIOINFORMATICS. Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data. Jijun Tang 1 and Bernard M.E. Moret 1 BIOINFORMATICS Vol. 1 no. 1 2003 Pages 1 8 Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data Jijun Tang 1 and Bernard M.E. Moret 1 1 Department of Computer Science, University of New

More information

A Practical Algorithm for Ancestral Rearrangement Reconstruction

A Practical Algorithm for Ancestral Rearrangement Reconstruction A Practical Algorithm for Ancestral Rearrangement Reconstruction Jakub Kováč, Broňa Brejová, and Tomáš Vinař 2 Department of Computer Science, Faculty of Mathematics, Physics, and Informatics, Comenius

More information

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 6, Numbers 3/4, 1999 Mary Ann Liebert, Inc. Pp. 369 386 Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction DANIEL H. HUSON, 1 SCOTT M.

More information

Phylogenetic Reconstruction: Handling Large Scale

Phylogenetic Reconstruction: Handling Large Scale p. Phylogenetic Reconstruction: Handling Large Scale and Complex Data Bernard M.E. Moret Department of Computer Science University of New Mexico p. Acknowledgments Main collaborators: Tandy Warnow (UT

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Phylogenetic Reconstruction from Arbitrary Gene-Order Data

Phylogenetic Reconstruction from Arbitrary Gene-Order Data Phylogenetic Reconstruction from Arbitrary Gene-Order Data Jijun Tang and Bernard M.E. Moret University of New Mexico Department of Computer Science Albuquerque, NM 87131, USA jtang,moret@cs.unm.edu Liying

More information

An Investigation of Phylogenetic Likelihood Methods

An Investigation of Phylogenetic Likelihood Methods An Investigation of Phylogenetic Likelihood Methods Tiffani L. Williams and Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131-1386 Email: tlw,moret @cs.unm.edu

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Industrial Applications of High-Performance Computing for Phylogeny Reconstruction

Industrial Applications of High-Performance Computing for Phylogeny Reconstruction Industrial Applications of High-Performance Computing for Phylogeny Reconstruction David A. Bader a, Bernard M.E. Moret b, and Lisa Vawter c a Electrical and Computer Engineering Department, University

More information

Isolating - A New Resampling Method for Gene Order Data

Isolating - A New Resampling Method for Gene Order Data Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Jijun Tang and Bernard M.E. Moret. Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA

Jijun Tang and Bernard M.E. Moret. Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA BIOINFORMATICS Vol. 19 Suppl. 1 2003, pages i30 i312 DOI:.93/bioinformatics/btg42 Scaling up accurate phylogenetic reconstruction from gene-order data Jijun Tang and Bernard M.E. Moret Department of Computer

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION Industrial Applications of High-Performance Computing for Phylogeny Reconstruction David A. Bader a, Bernard M.E. Moret b,andlisavawter c a Electrical and Computer Engineering Department, University of

More information

A few logs suce to build (almost) all trees: Part II

A few logs suce to build (almost) all trees: Part II Theoretical Computer Science 221 (1999) 77 118 www.elsevier.com/locate/tcs A few logs suce to build (almost) all trees: Part II Peter L. Erdős a;, Michael A. Steel b,laszlo A.Szekely c, Tandy J. Warnow

More information

High-Performance Algorithm Engineering for Large-Scale Graph Problems and Computational Biology

High-Performance Algorithm Engineering for Large-Scale Graph Problems and Computational Biology High-Performance Algorithm Engineering for Large-Scale Graph Problems and Computational Biology David A. Bader Electrical and Computer Engineering Department, University of New Mexico, Albuquerque, NM

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

LOWER BOUNDS ON SEQUENCE LENGTHS REQUIRED TO RECOVER THE EVOLUTIONARY TREE. (extended abstract submitted to RECOMB '99)

LOWER BOUNDS ON SEQUENCE LENGTHS REQUIRED TO RECOVER THE EVOLUTIONARY TREE. (extended abstract submitted to RECOMB '99) LOWER BOUNDS ON SEQUENCE LENGTHS REQUIRED TO RECOVER THE EVOLUTIONARY TREE MIKL OS CS } UR OS AND MING-YANG KAO (extended abstract submitted to RECOMB '99) Abstract. In this paper we study the sequence

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012 CS 394C Algorithms for Computational Biology Tandy Warnow Spring 2012 Biology: 21st Century Science! When the human genome was sequenced seven years ago, scientists knew that most of the major scientific

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Course Staff Professor Tandy Warnow Office hours Tuesdays after class (2-3 PM) in Siebel 3235 Email address:

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

The Generalized Neighbor Joining method

The Generalized Neighbor Joining method The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

An Improved Algorithm for Ancestral Gene Order Reconstruction

An Improved Algorithm for Ancestral Gene Order Reconstruction V. Kůrková et al. (Eds.): ITAT 2014 with selected papers from Znalosti 2014, CEUR Workshop Proceedings Vol. 1214, pp. 46 53 http://ceur-ws.org/vol-1214, Series ISSN 1613-0073, c 2014 A. Herencsár B. Brejová

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign Today Explain the course Introduce some of the research in this area Describe some open problems Talk about

More information

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM MENG ZHANG College of Computer Science and Technology, Jilin University, China Email: zhangmeng@jlueducn WILLIAM ARNDT AND JIJUN TANG Dept of Computer Science

More information

A Minimum Spanning Tree Framework for Inferring Phylogenies

A Minimum Spanning Tree Framework for Inferring Phylogenies A Minimum Spanning Tree Framework for Inferring Phylogenies Daniel Giannico Adkins Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-010-157

More information

GASTS: Parsimony Scoring under Rearrangements

GASTS: Parsimony Scoring under Rearrangements GASTS: Parsimony Scoring under Rearrangements Andrew Wei Xu and Bernard M.E. Moret Laboratory for Computational Biology and Bioinformatics, EPFL, EPFL-IC-LCBB INJ230, Station 14, CH-1015 Lausanne, Switzerland

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Reconstruction of certain phylogenetic networks from their tree-average distances

Reconstruction of certain phylogenetic networks from their tree-average distances Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

A Framework for Orthology Assignment from Gene Rearrangement Data

A Framework for Orthology Assignment from Gene Rearrangement Data A Framework for Orthology Assignment from Gene Rearrangement Data Krister M. Swenson, Nicholas D. Pattengale, and B.M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131,

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

BIOINFORMATICS DISCOVERY NOTE

BIOINFORMATICS DISCOVERY NOTE BIOINFORMATICS DISCOVERY NOTE Designing Fast Converging Phylogenetic Methods!" #%$&('$*),+"-%./ 0/132-%$ 0*)543768$'9;:(0'=A@B2$0*)A@B'9;9CD

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data

Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data Joel V. Earnest-DeYoung 1, Emmanuelle Lerat 2, and Bernard M.E. Moret 1,3 Abstract In the last few years,

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B Steve Thompson: stthompson@valdosta.edu http://www.bioinfo4u.net 1 ʻTree of Life,ʼ ʻprimitive,ʼ ʻprogressʼ

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Accepted Manuscript. Maximum likelihood estimates of pairwise rearrangement distances

Accepted Manuscript. Maximum likelihood estimates of pairwise rearrangement distances Accepted Manuscript Maximum likelihood estimates of pairwise rearrangement distances Stuart Serdoz, Attila Egri-Nagy, Jeremy Sumner, Barbara R. Holland, Peter D. Jarvis, Mark M. Tanaka, Andrew R. Francis

More information

Reconstructing Trees from Subtree Weights

Reconstructing Trees from Subtree Weights Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree

More information

On Reversal and Transposition Medians

On Reversal and Transposition Medians On Reversal and Transposition Medians Martin Bader International Science Index, Computer and Information Engineering waset.org/publication/7246 Abstract During the last years, the genomes of more and more

More information

Martin Bader June 25, On Reversal and Transposition Medians

Martin Bader June 25, On Reversal and Transposition Medians Martin Bader June 25, 2009 On Reversal and Transposition Medians Page 2 On Reversal and Transposition Medians Martin Bader June 25, 2009 Genome Rearrangements During evolution, the gene order in a chromosome

More information

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Mul$ple Sequence Alignment Methods Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Species Tree Orangutan Gorilla Chimpanzee Human From the Tree of the Life

More information

Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

More information

A new algorithm to construct phylogenetic networks from trees

A new algorithm to construct phylogenetic networks from trees A new algorithm to construct phylogenetic networks from trees J. Wang College of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia, China Corresponding author: J. Wang E-mail: wangjuanangle@hit.edu.cn

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

CS 581 Paper Presentation

CS 581 Paper Presentation CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis by Sebastien Roch and Sagi Snir Overview Introduction

More information

Phylogenetics: Likelihood

Phylogenetics: Likelihood 1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University The Problem 2 Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions 3 Characters are mutually

More information

Analysis of Gene Order Evolution beyond Single-Copy Genes

Analysis of Gene Order Evolution beyond Single-Copy Genes Analysis of Gene Order Evolution beyond Single-Copy Genes Nadia El-Mabrouk Département d Informatique et de Recherche Opérationnelle Université de Montréal mabrouk@iro.umontreal.ca David Sankoff Department

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

More information

Presentation by Julie Hudson MAT5313

Presentation by Julie Hudson MAT5313 Proc. Natl. Acad. Sci. USA Vol. 89, pp. 6575-6579, July 1992 Evolution Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome (genomics/algorithm/inversions/edit distance/conserved

More information

arxiv: v1 [q-bio.pe] 1 Jun 2014

arxiv: v1 [q-bio.pe] 1 Jun 2014 THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University Phylogenetics: Parsimony and Likelihood COMP 571 - Spring 2016 Luay Nakhleh, Rice University The Problem Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions

More information

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions PLGW05 Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 1 joint work with Ilan Gronau 2, Shlomo Moran 3, and Irad Yavneh 3 1 2 Dept. of Biological Statistics and Computational

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES INTRODUCTION CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES This worksheet complements the Click and Learn developed in conjunction with the 2011 Holiday Lectures on Science, Bones, Stones, and Genes:

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

Opportunities and Challenges in Computational Biology

Opportunities and Challenges in Computational Biology Opportunities and Challenges in Computational Biology Srinivas Aluru Electrical & Computer Engineering Lawrence H. Baker Center for Bioinformatics & Biological Statistics Iowa State University aluru@iastate.edu

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela, Veli Mäkinen, Esa Pitkänen 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Background

More information

Phylogenetic inference: from sequences to trees

Phylogenetic inference: from sequences to trees W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences

More information

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa

More information

Phylogenetics: Parsimony

Phylogenetics: Parsimony 1 Phylogenetics: Parsimony COMP 571 Luay Nakhleh, Rice University he Problem 2 Input: Multiple alignment of a set S of sequences Output: ree leaf-labeled with S Assumptions Characters are mutually independent

More information

MAXIMUM LIKELIHOOD PHYLOGENETIC RECONSTRUCTION FROM HIGH-RESOLUTION WHOLE-GENOME DATA AND A TREE OF 68 EUKARYOTES

MAXIMUM LIKELIHOOD PHYLOGENETIC RECONSTRUCTION FROM HIGH-RESOLUTION WHOLE-GENOME DATA AND A TREE OF 68 EUKARYOTES MAXIMUM LIKELIHOOD PHYLOGENETIC RECONSTRUCTION FROM HIGH-RESOLUTION WHOLE-GENOME DATA AND A TREE OF 68 EUKARYOTES YU LIN Laboratory for Computational Biology and Bioinformatics, EPFL, Lausanne VD, CH-115,

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

Characteristics of Life

Characteristics of Life UNIT 2 BIODIVERSITY Chapter 4- Patterns of Life Biology 2201 Characteristics of Life All living things share some basic characteristics: 1) living things are organized systems made up of one or more cells

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information