Exploring Treespace. Katherine St. John. Lehman College & the Graduate Center. City University of New York. 20 June 2011

Similar documents
Walks in Phylogenetic Treespace

Finding the best tree by heuristic search

SPR Distance Computation for Unrooted Trees

Molecular Evolution & Phylogenetics

Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees

On the Subnet Prune and Regraft distance

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

Department of Mathematics and Statistics University of Canterbury Private Bag 4800 Christchurch, New Zealand

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES

arxiv: v1 [q-bio.pe] 16 Aug 2007

arxiv: v1 [q-bio.pe] 1 Jun 2014

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

Parsimony via Consensus

Evolutionary Trees. Evolutionary tree. To describe the evolutionary relationship among species A 3 A 2 A 4. R.C.T. Lee and Chin Lung Lu

Maximum Agreement Subtrees

Phylogenetics. BIOL 7711 Computational Bioscience

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

More on NP and Reductions

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Evolutionary Tree Analysis. Overview

Phylogeny. November 7, 2017

The Complexity of the uspr Distance

July 18, Approximation Algorithms (Travelling Salesman Problem)

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Who Has Heard of This Problem? Courtesy: Jeremy Kun

arxiv: v2 [q-bio.pe] 4 Feb 2016

The maximum agreement subtree problem

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Improved Approximations for Cubic Bipartite and Cubic TSP

Phylogenetic Tree Reconstruction

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

Hamiltonian Cycle. Hamiltonian Cycle

CS 581 Paper Presentation

Phylogenetic trees 07/10/13

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

A Fitness Distance Correlation Measure for Evolutionary Trees

Beyond Galled Trees Decomposition and Computation of Galled Networks

Notes 3 : Maximum Parsimony

Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee)

Properties of normal phylogenetic networks

From graph classes to phylogenetic networks Philippe Gambette

BINF6201/8201. Molecular phylogenetic methods

UNICYCLIC NETWORKS: COMPATIBILITY AND ENUMERATION

Algorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

DISTRIBUTIONS OF CHERRIES FOR TWO MODELS OF TREES

Bounds on the Traveling Salesman Problem

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

CS/COE

D1 Discrete Mathematics The Travelling Salesperson problem. The Nearest Neighbour Algorithm The Lower Bound Algorithm The Tour Improvement Algorithm

Splits and Phylogenetic Networks. Daniel H. Huson

Phylogeny: building the tree of life

A (short) introduction to phylogenetics

Distribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers

What is Phylogenetics

SAT, Coloring, Hamiltonian Cycle, TSP

Ant Colony Optimization: an introduction. Daniel Chivilikhin

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa

Balanced Allocation Through Random Walk

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Lassoing phylogenetic trees

Efficiency of Markov Chain Monte Carlo Tree Proposals in Bayesian Phylogenetics

Tight Bounds on the Diameter of Gaussian Cubes

Navigation in the Space of Hierarchies using NNI Moves

A Geometric Approach to Tree Shape Statistics

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Fast Hash-Based Algorithms for Analyzing Tens of Thousands of Evolutionary Trees

A Phylogenetic Network Construction due to Constrained Recombination

Evolution, University of Auckland, New Zealand PLEASE SCROLL DOWN FOR ARTICLE

Algorithms in Bioinformatics

ALGORITHMIC STRATEGIES FOR ESTIMATING THE AMOUNT OF RETICULATION FROM A COLLECTION OF GENE TREES

Spanning trees with minimum weighted degrees

Finding a gene tree in a phylogenetic network Philippe Gambette

Limitations of Markov Chain Monte Carlo Algorithms for Bayesian Inference of Phylogeny

Phylogenetic Networks, Trees, and Clusters

K-center Hardness and Max-Coverage (Greedy)

Workshop III: Evolutionary Genomics

Reconstructing Trees from Subtree Weights

Intro to Contemporary Math

Nearest Neighbor Search with Keywords

Theory of Evolution Charles Darwin

On the complexity of approximate multivariate integration

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

TheDisk-Covering MethodforTree Reconstruction

Pattern Popularity in 132-Avoiding Permutations

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

arxiv: v1 [cs.cc] 9 Oct 2014

Characterization of Fixed Points in Sequential Dynamical Systems

Algorithms for efficient phylogenetic tree construction

Traveling Salesman Problem

Kei Takahashi and Masatoshi Nei

Combinatorial Aspects of Tropical Geometry and its interactions with phylogenetics

Determining conditions sufficient for the existence of arc-disjoint hamiltonian paths and out-branchings in tournaments

Transcription:

Exploring Treespace Katherine St. John Lehman College & the Graduate Center City University of New York 20 June 2011 (Joint work with the Treespace Working Group, CUNY: Ann Marie Alcocer, Kadian Brown, Alan Joseph Caceres, Juan Castillo, Samantha Daley, John De Jesus, Eric Ford, Kaitlin Hansen, Michael Hintze, Daniele Ippolito, Jinnie Lee, Oliver Mendez, & Diquan Moore)

Phylogenetic Trees leaves represent extant (living) species Charles Darwin, 1837 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 2 / 19

Phylogenetic Trees leaves represent extant (living) species internal nodes represent extinct species Charles Darwin, 1837 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 2 / 19

Phylogenetic Trees leaves represent extant (living) species internal nodes represent extinct species if rooted, the root represents the ancestor of all the species Charles Darwin, 1837 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 2 / 19

Some Uses of Evolutionary Trees David Hillis, 2002 classifying species building the Tree of Life designing the flu vaccine and other drugs determining the origins of HIV infection Katherine St. John (CUNY) Exploring Treespace 20 June 2011 3 / 19

How Many Phylogenetic Trees? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 4 / 19

How Many Phylogenetic Trees? Semple & Steel 2003: # of trees = 1 3 5 (2n 5) = (2n 5)!! 1 2 n 2 n!n 5 2 Π 1 2 2 ( 2 e )n n n 2 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 4 / 19

How Many Phylogenetic Trees? Semple & Steel 2003: # of trees = 1 3 5 (2n 5) = (2n 5)!! 1 2 n 2 n!n 5 2 Π 1 2 2 ( 2 e )n n n 2 (For n 50, more possible tree topologies than there are atoms in the universe.) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 4 / 19

Searching for Optimal Trees Local search techniques prevail: NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 5 / 19

Searching for Optimal Trees Local search techniques prevail: at each step, choose the next tree from its neighbors NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 5 / 19

Searching for Optimal Trees Local search techniques prevail: at each step, choose the next tree from its neighbors Many different ways to define neighbors. NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 5 / 19

Searching for Optimal Trees NNI Treespace for n = 6 Local search techniques prevail: at each step, choose the next tree from its neighbors Many different ways to define neighbors. Most rely on an underlying metric between trees. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 5 / 19

Popular Metrics Focus on three popular metrics: A B Nearest Neighbor Interchange (NNI) C D E G F Subtree Prune and Regraft (SPR) Tree Bisection and Reconnection (TBR) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 6 / 19

Popular Metrics Focus on three popular metrics: A B Nearest Neighbor Interchange (NNI) C D E G F Subtree Prune and Regraft (SPR) Tree Bisection and Reconnection (TBR) Note that NNI SPR TBR. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 6 / 19

NNI Metric A B A B C D E G F C F G D E The NNI distance between two trees is the minimal number of moves needed to transform one to the other (NP-hard, DasGupta et al. 1997). Katherine St. John (CUNY) Exploring Treespace 20 June 2011 7 / 19

SPR Distance A B A B C C G D F D E E G F C D A B G F E SPR distance is the minimal number of moves that transforms one tree into the other. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 8 / 19

SPR Distance A B A B C C G D F D E E G F C D A B G F E SPR distance is the minimal number of moves that transforms one tree into the other. SPR for rooted trees is NP-hard. (Bordewich & Semple 05) SPR for unrooted trees is NP-hard. (Hickey et al. 08) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 8 / 19

TBR Distance A B C D E G F A B C D E G F A B A B C D E G F C D G E F TBR distance is the minimal number of moves that transforms one tree into the other. TBR for rooted trees is NP-hard. (Allen & Steel 01) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 9 / 19

Treespace For every n, treespace is the space of all phylogenetic trees on a n taxa, under a fixed metric. Treespace for n = 5 under NNI Bastert et al., 2002 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 10 / 19

Treespace For every n, treespace is the space of all phylogenetic trees on a n taxa, under a fixed metric. The size of neighborhoods varies by metric (Allen & Steel, 2001): Treespace for n = 5 under NNI Bastert et al., 2002 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 10 / 19

Treespace For every n, treespace is the space of all phylogenetic trees on a n taxa, under a fixed metric. The size of neighborhoods varies by metric (Allen & Steel, 2001): Treespace for n = 5 under NNI Bastert et al., 2002 General n = 5 NNI 2n 6 4 SPR 2(n 3)(2n 7) 12 TBR < (2n 3)(n 3) 2 12 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 10 / 19

Bryant s Challenge: Walking Through Trees Combinatorial Challenge I: An NNI-walk is a sequencet 1, T 2,..., T k of unrooted binary phylogenetic trees where each consecutive pair of trees differ by a single NNI. David Bryant Katherine St. John (CUNY) Exploring Treespace 20 June 2011 11 / 19

Bryant s Challenge: Walking Through Trees Combinatorial Challenge I: An NNI-walk is a sequencet 1, T 2,..., T k of unrooted binary phylogenetic trees where each consecutive pair of trees differ by a single NNI. What is the shortest NNI walk that passes through all binary trees on n leaves? David Bryant Katherine St. John (CUNY) Exploring Treespace 20 June 2011 11 / 19

Bryant s Challenge: Walking Through Trees David Bryant Combinatorial Challenge I: An NNI-walk is a sequencet 1, T 2,..., T k of unrooted binary phylogenetic trees where each consecutive pair of trees differ by a single NNI. What is the shortest NNI walk that passes through all binary trees on n leaves? Suppose we are given a tree T. What is the shortest NNI walk that passes through all the trees that lie at most one SPR (subtree prune and regraft) move from T? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 11 / 19

Hamiltonicity Recall that a graph is Hamiltonian if there exists a path that visits every node exactly once. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 12 / 19

Hamiltonicity Recall that a graph is Hamiltonian if there exists a path that visits every node exactly once. Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 12 / 19

Hamiltonicity Recall that a graph is Hamiltonian if there exists a path that visits every node exactly once. Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 12 / 19

Hamiltonicity Recall that a graph is Hamiltonian if there exists a path that visits every node exactly once. Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Open: What is the shortest path for the NNI treespace for n 8? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 12 / 19

Proof Sketch of Walks of Treespace: By induction on n. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19

Proof Sketch of Walks of Treespace: By induction on n. Assume there is a path for treespace of n taxa trees, and build from it a path for treespace of n + 1-taxa trees. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19

Proof Sketch of Walks of Treespace: By induction on n. Assume there is a path for treespace of n taxa trees, and build from it a path for treespace of n + 1-taxa trees. Expand each n-taxa tree into 2(n 3) (n + 1)-taxa trees. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19

Proof Sketch of Walks of Treespace: By induction on n. Assume there is a path for treespace of n taxa trees, and build from it a path for treespace of n + 1-taxa trees. Expand each n-taxa tree into 2(n 3) (n + 1)-taxa trees. For SPR (and TBR), there is a Hamiltonian path in each such group. (Not true for NNI only a 2-walk) Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19

Proof Sketch of Walks of Treespace: By induction on n. Assume there is a path for treespace of n taxa trees, and build from it a path for treespace of n + 1-taxa trees. Expand each n-taxa tree into 2(n 3) (n + 1)-taxa trees. For SPR (and TBR), there is a Hamiltonian path in each such group. (Not true for NNI only a 2-walk) Glue the paths together to make a path for the whole space. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 13 / 19

SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 14 / 19

SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? An SPR-neighborhood for n = 6: Katherine St. John (CUNY) Exploring Treespace 20 June 2011 14 / 19

SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? An SPR-neighborhood for n = 6: Theorem: n 6, there exists SPR-neighborhoods where the NNI-walk cannot be a Hamiltonian path. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 14 / 19

SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? An SPR-neighborhood for n = 6: Theorem: n 6, there exists SPR-neighborhoods where the NNI-walk cannot be a Hamiltonian path. Proof Idea: The SPR-neighborhood of caterpillars all have 4 isolated triangles. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 14 / 19

SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 15 / 19

SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? The SPR-neighborhoods for n = 7: caterpillar tree other tree topology Katherine St. John (CUNY) Exploring Treespace 20 June 2011 15 / 19

SPR Neighborhoods Bryant s Conjecture: What is the shortest NNI-walk on a SPR-neighborhood? The SPR-neighborhoods for n = 7: caterpillar tree other tree topology Shortest paths are +8 and +6 more than the minimal Hamiltonian path. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 15 / 19

Summary Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 16 / 19

Summary Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Open: What is the shortest path for the NNI treespace for n 8? NNI Treespace for n = 6 Katherine St. John (CUNY) Exploring Treespace 20 June 2011 16 / 19

Summary NNI Treespace for n = 6 Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Open: What is the shortest path for the NNI treespace for n 8? Theorem: n 6, the minimal NNI-walk of an SPR-neighborhood is not Hamiltonian. Katherine St. John (CUNY) Exploring Treespace 20 June 2011 16 / 19

Summary NNI Treespace for n = 6 Caceres, Daley, De Jesus, Hintze, Moore, & St. John, 2011: Theorem: n, treespace is Hamiltonian for the SPR and TBR metrics. Theorem: n 7, treespace is Hamiltonian for the NNI metric. Open: What is the shortest path for the NNI treespace for n 8? Theorem: n 6, the minimal NNI-walk of an SPR-neighborhood is not Hamiltonian. Open: What is the shortest NNI-walk of an SPR-neighborhood? Katherine St. John (CUNY) Exploring Treespace 20 June 2011 16 / 19

Treespace Working Group A team of undergraduate students in mathematics and computer science contributed to this work: Ann Marie Alcocer, Kadian Brown, Alan Caceres, Samantha Daley, John De Jesus, Kaitlin Hansen, Michael Hintze, Daniele Ippolito, Jinnie Lee, Oliver Mendez, and Diquan Moore Katherine St. John (CUNY) Exploring Treespace 20 June 2011 17 / 19

Acknowledgments The organizers and the Isaac Newton Institute The US National Science Foundation for their generous support The New York Louis Stokes Alliance for Minority Participation in Research for student funding Katherine St. John (CUNY) Exploring Treespace 20 June 2011 18 / 19

Acknowledgments The organizers and Isaac Newton Institute The National Science Foundation for their generous support The New York Louis Stokes Alliance for Minority Participation in Research for student funding And last, but not least, Sean Katherine St. John (CUNY) Exploring Treespace 20 June 2011 19 / 19