Finding a gene tree in a phylogenetic network Philippe Gambette

Similar documents
Phylogenetic networks: overview, subclasses and counting problems Philippe Gambette

From graph classes to phylogenetic networks Philippe Gambette

Solving the Tree Containment Problem for Genetically Stable Networks in Quadratic Time

A new algorithm to construct phylogenetic networks from trees

Beyond Galled Trees Decomposition and Computation of Galled Networks

Regular networks are determined by their trees

On the Subnet Prune and Regraft distance

arxiv: v1 [q-bio.pe] 12 Jul 2017

Restricted trees: simplifying networks with bottlenecks

Phylogenetic Networks, Trees, and Clusters

Properties of normal phylogenetic networks

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

An introduction to phylogenetic networks

Splits and Phylogenetic Networks. Daniel H. Huson

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Phylogenetic Networks with Recombination

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

A program to compute the soft Robinson Foulds distance between phylogenetic networks

ALGORITHMIC STRATEGIES FOR ESTIMATING THE AMOUNT OF RETICULATION FROM A COLLECTION OF GENE TREES

arxiv: v5 [q-bio.pe] 24 Oct 2016

Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies

Intraspecific gene genealogies: trees grafting into networks

A Phylogenetic Network Construction due to Constrained Recombination

The Combinatorics of Finite Metric Spaces and the (Re-)Construction of Phylogenetic Trees.

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore

UNICYCLIC NETWORKS: COMPATIBILITY AND ENUMERATION

Reconstruction of certain phylogenetic networks from their tree-average distances

Tree-average distances on certain phylogenetic networks have their weights uniquely determined

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES

Inferring a level-1 phylogenetic network from a dense set of rooted triplets

Reconstructing Phylogenetic Networks Using Maximum Parsimony

arxiv: v3 [q-bio.pe] 1 May 2014

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

Improved maximum parsimony models for phylogenetic networks

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

TheDisk-Covering MethodforTree Reconstruction

Lassoing phylogenetic trees

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

arxiv: v1 [cs.cc] 9 Oct 2014

A phylogenomic toolbox for assembling the tree of life

Reconstructing Phylogenetic Networks

Integer Programming for Phylogenetic Network Problems

Bioinformatics Advance Access published August 23, 2006

A 3-approximation algorithm for the subtree distance between phylogenies

Efficient Parsimony-based Methods for Phylogenetic Network Reconstruction

Reticulation in Evolution

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Reconstructing Trees from Subtree Weights

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

ISMB-Tutorial: Introduction to Phylogenetic Networks. Daniel H. Huson

Aphylogenetic network is a generalization of a phylogenetic tree, allowing properties that are not tree-like.

The Complexity of Computing the Sign of the Tutte Polynomial

arxiv: v1 [cs.ds] 1 Nov 2018

Distances that Perfectly Mislead

Exploring Treespace. Katherine St. John. Lehman College & the Graduate Center. City University of New York. 20 June 2011

arxiv: v1 [q-bio.pe] 1 Jun 2014

Downloaded 10/29/15 to Redistribution subject to SIAM license or copyright; see

Introduction to Phylogenetic Networks

Department of Mathematics and Statistics University of Canterbury Private Bag 4800 Christchurch, New Zealand

PHYLOGENIES are the main tool for representing evolutionary

Allen Holder - Trinity University

BINF6201/8201. Molecular phylogenetic methods

A fuzzy weighted least squares approach to construct phylogenetic network among subfamilies of grass species

1.1 The (rooted, binary-character) Perfect-Phylogeny Problem

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

Faster quantum algorithm for evaluating game trees

Combinatorial and probabilistic methods in biodiversity theory

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Physical Mapping. Restriction Mapping. Lecture 12. A physical map of a DNA tells the location of precisely defined sequences along the molecule.

Structure-Based Comparison of Biomolecules

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

Preface. Contributors

Integrating Sequence and Topology for Efficient and Accurate Detection of Horizontal Gene Transfer

Inference of Parsimonious Species Phylogenies from Multi-locus Data

Parsimony via Consensus

Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel

arxiv: v1 [q-bio.pe] 16 Aug 2007

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT

A new e±cient algorithm for inferring explicit hybridization networks following the Neighbor-Joining principle

An Overview of Combinatorial Methods for Haplotype Inference

BIOINFORMATICS. Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data. Jijun Tang 1 and Bernard M.E. Moret 1

Limitations of Algorithm Power

Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa

Maximizing a Submodular Function with Viability Constraints

Haplotype Inference Constrained by Plausible Haplotype Data

VIII. NP-completeness

Discrete Applied Mathematics

DISTINGUISH HARD INSTANCES OF AN NP-HARD PROBLEM USING MACHINE LEARNING

RECONSTRUCTING PATTERNS OF RETICULATE

Distance Corrections on Recombinant Sequences

Dr. Amira A. AL-Hosary

Effects of Gap Open and Gap Extension Penalties

Parsimonious cluster systems

Minimal Phylogenetic Supertrees and Local Consensus Trees

(Nearly)-tight bounds on the linearity and contiguity of cographs

Transcription:

LRI-LIX BioInfo Seminar 19/01/2017 - Palaiseau Finding a gene tree in a phylogenetic network Philippe Gambette

Outline Phylogenetic networks Classes of phylogenetic networks The Tree Containment Problem

Outline Phylogenetic networks Classes of phylogenetic networks The Tree Contaiment Problem

Phylogenetic trees Phylogenetic tree of a set of species species tree S A B tokogeny of individuals C A B C

Genetic material transfers Transfers of genetic material between coexisting species lateral gene transfer hybridization recombination S A B C A B C

Genetic material transfers Transfers of genetic material between coexisting species lateral gene transfer hybridization recombination species tree S A B C gene G1 A A B network N C A B C B C

Genetic material transfers Transfers of genetic material between coexisting species lateral gene transfer hybridization recombination species tree S gene G1 A A B C B C incompatible gene trees gene G2 A B network N C A B C A B C

Phylogenetic networks Phylogenetic network network representing evolution data explicit phylogenetic networks model evolution Simplistic synthesis diagram galled network level-2 network Dendroscope HorizStory abstract phylogenetic networks classify, visualize data split network minimum spanning network median network SplitsTree Network TCS

Today focus on explicit phylogenetic networks A gallery of explicit phylogenetic networks http//phylnet.univ-mlv.fr/recophync/networkdraw.php

Books about phylogenetic networks Dress, Huber, Koolen, Moulton, Spillner, 2012 Gusfield, 2014 Huson, Rupp, Scornavacca, 2011 Morrison, 2011

Workshops about phylogenetic networks The Future of Phylogenetic Networks, 15-19 October 2012, Lorentz Center, Leiden, The Netherlands Utilizing Genealogical Phylogenetic Networks in Evolutionary Biology Touching the Data, 7-11 July 2014, Lorentz Center, Leiden, The Netherlands The Phylogenetic Network Workshop, 27-31 Jul 2015, Institute for Mathematical Science (National University of Singapore)

Who is who in Phylogenetic Networks? Based on BibAdmin by Sergiu Chelcea + tag clouds, date histograms, journal lists, keyword definitions, co-author graphs http//phylnet.univ-mlv.fr

Who is who in Phylogenetic Networks? Who is Who in Phylogenetic Networks, Articles, Authors & Programs Analysis of the co-author and keyword graphs internship of Tushar Agarwal Agarwal, Gambette & Morrison, arxiv, 2017 http//phylnet.univ-mlv.fr

Who is who in Phylogenetic Networks?

Who is who in Phylogenetic Networks? input software

Who is who in Phylogenetic Networks? input software classes E problems algorithmic properties

Outline Phylogenetic networks Classes of phylogenetic networks The Tree Contaiment Problem

Classes of phylogenetic networks joint work with Maxime Morgado and Narges Tavassoli http//phylnet.univ-mlv.fr/isiphync/

Classes of phylogenetic networks inclusions joint work with Maxime Morgado and Narges Tavassoli

Classes of phylogenetic networks inclusions joint work with Maxime Morgado and Narges Tavassoli

Classes of phylogenetic networks inclusions joint work with Maxime Morgado and Narges Tavassoli http//phylnet.univ-mlv.fr/isiphync/network.php?id=4

Classes of phylogenetic networks inclusions level = maximum number of reticulation vertices among all bridgeless components in the network cluster = set of leaves below a vertex joint work with Maxime Morgado and Narges Tavassoli http//phylnet.univ-mlv.fr/isiphync/network.php?id=4

Classes of phylogenetic networks inclusions level = maximum number of reticulation vertices among all bridgeless components in the network cluster = set of leaves below a vertex joint work with Maxime Morgado and Narges Tavassoli http//phylnet.univ-mlv.fr/isiphync/network.php?id=4

Classes of phylogenetic networks new inclusions joint work with Maxime Morgado and Narges Tavassoli

Classes of phylogenetic networks joint work with Maxime Morgado and Narges Tavassoli

Classes of phylogenetic networks Polynomial time algorithm available on class A works on subclass B NP-completeness on class B NP-completeness on superclass A (similar to ISGCI) joint work with Maxime Morgado and Narges Tavassoli

Outline Phylogenetic networks Classes of phylogenetic networks The Tree Contaiment Problem

Phylogenetic network reconstruction 1 2 3 4 5 6 7 AATTGCAG ACCTGCAG GCTTGCCG ATTTGCAG TAGCCCAAAAT TAGACCAAT TAGACAAGAAT AAGACCAAAT TAGACAAGAAT ACTTGCAG TAGCACAAAAT ACCTGGTG TAAAAT G1 G2 {gene sequences} distance methods Bandelt & Dress 1992 - Legendre & Makarenkov 2000 Bryant & Moulton 2002 - Chan, Jansson, Lam & Yiu 2006 parsimony methods Hein 1990 - Kececioglu & Gusfield 1994 - Jin, Nakhleh, Snir, Tuller 2009 - Park, Jin & Nakhleh 2010 - Kannan & Wheeler, 2012 likelihood methods network N Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 Velasco & Sober 2009 - Meng & Kubatko 2009

Phylogenetic network reconstruction Problem methods are usually slow, especially with rapidly increasing sequence length. 1 2 3 4 5 6 7 AATTGCAG ACCTGCAG GCTTGCCG ATTTGCAG TAGCCCAAAAT TAGACCAAT TAGACAAGAAT AAGACCAAAT TAGACAAGAAT ACTTGCAG TAGCACAAAAT ACCTGGTG TAAAAT G1 G2 {gene sequences} distance methods Bandelt & Dress 1992 - Legendre & Makarenkov 2000 Bryant & Moulton 2002 - Chan, Jansson, Lam & Yiu 2006 parsimony methods Hein 1990 - Kececioglu & Gusfield 1994 - Jin, Nakhleh, Snir, Tuller 2009 - Park, Jin & Nakhleh 2010 - Kannan & Wheeler, 2012 likelihood methods network N Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 Velasco & Sober 2009 - Meng & Kubatko 2009

Phylogenetic network reconstruction 1 2 3 4 5 6 7 AATTGCAG ACCTGCAG GCTTGCCG ATTTGCAG TAGCCCAAAAT TAGACCAAT TAGACAAGAAT AAGACCAAAT TAGACAAGAAT ACTTGCAG TAGCACAAAAT ACCTGGTG TAAAAT G1 G2 {gene sequences} Reconstruction of a tree for each gene present in several species Guindon & Gascuel, SB, 2003 T1 {trees} T2 HOGENOM Database Dufayard, Duret, Penel, Gouy, Rechenmann & Perrière, BioInf, 2005 Tree reconciliation or consensus explicit network optimal super-network N - contains the input trees - has the smallest number of reticulations http//doua.prabi.fr/databases/hogenom/home.php?contents=hogenom4

Phylogenetic network reconstruction 1 2 3 4 5 6 7 AATTGCAG ACCTGCAG GCTTGCCG ATTTGCAG TAGCCCAAAAT TAGACCAAT TAGACAAGAAT AAGACCAAAT TAGACAAGAAT ACTTGCAG TAGCACAAAAT ACCTGGTG TAAAAT G1 G2 {gene sequences} Reconstruction of a tree for each gene present in several species Guindon & Gascuel, SB, 2003 T1 {trees} T2 HOGENOM Database Dufayard, Duret, Penel, Gouy, Rechenmann & Perrière, BioInf, 2005 1470 species, >290 000 trees Tree reconciliation or consensus explicit network optimal super-network N - contains the input trees - has the smallest number of reticulations http//doua.prabi.fr/databases/hogenom/home.php?contents=hogenom4

Phylogenetic network reconstruction 1 2 3 4 5 6 7 AATTGCAG ACCTGCAG GCTTGCCG ATTTGCAG TAGCCCAAAAT TAGACCAAT TAGACAAGAAT AAGACCAAAT TAGACAAGAAT ACTTGCAG TAGCACAAAAT ACCTGGTG TAAAAT G1 G2 {gene sequences} Reconstruction of a tree for each gene present in several species Guindon & Gascuel, SB, 2003 T1 {trees} T2 HOGENOM Database Dufayard, Duret, Penel, Gouy, Rechenmann & Perrière, BioInf, 2005 1470 species, >290 000 trees Tree reconciliation or consensus explicit network Tree Containment Problem optimal super-network N - contains the input trees - has the smallest number of reticulations http//doua.prabi.fr/databases/hogenom/home.php?contents=hogenom4

The Tree Containment Problem (T.C.P.) Input A binary phylogenetic network N and a tree T over the same set of taxa. Question Does N display T?

The Tree Containment Problem (T.C.P.) Input A binary phylogenetic network N and a tree T over the same set of taxa. Question Does N display T? Can we remove one incoming arc, for each vertex with >1 parent in N, such that the obtained tree is equivalent to T? T N a b c d a b c d

The Tree Containment Problem (T.C.P.) Input A binary phylogenetic network N and a tree T over the same set of taxa. Question Does N display T? Can we remove one incoming arc, for each vertex with >1 parent in N, such that the obtained tree is equivalent to T? T N a b c d a b c d

The Tree Containment Problem (T.C.P.) Input A binary phylogenetic network N and a tree T over the same set of taxa. Question Does N display T? NP-complete in general (Kanj, Nakhleh, Than & Xia, 2008) NP-complete for tree-sibling, time-consistent, regular networks (Iersel, Semple & Steel, 2010) Polynomial-time solvable for normal networks, for binary tree-child networks, and for level-k networks (Iersel, Semple & Steel, 2010)

Classes of phylogenetic networks and the T.C.P. State of the art

Classes of phylogenetic networks and the T.C.P. State of the art

Classes of phylogenetic networks and the T.C.P. Recent results (since 2015)

Reticulation-visible and nearly-stable networks A vertex u is stable if there exists a leaf l such that all paths from the root to l go through u. A phylogenetic network is reticulation-visible if every reticulation vertex is stable. A phylogenetic network is nearly-stable if for each vertex, either it is stable or its parents are. Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Strategy to get a quadratic time algorithm for T.C.P. Given N, a phylogenetic network with n leaves and the input tree T of the T.C.P. Theorem 1 If N is reticulation-visible then #{reticulation vertices of N} 4(n-1) #{vertices of N} 9n Theorem 2 If N is nearly-stable then #{reticulation vertices of N} 12(n-1) Theorem 3 Considering a longest path in N (nearly stable), it is possible, in constant time either to realize that T is not contained in N or to build a network N with less arcs than N such that T contained in N if and only if T contained in N Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network Decompose N into 2n-2 paths remove one reticulation arc per reticulation, ensuring we get no «dummy leaf», to get a tree T with n leaves summarize T into a rooted binary tree T with n leaves... and 2n-2 arcs We can prove (technical) that each path contains at most 2 reticulation vertices N Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network Decompose N into 2n-2 paths remove one reticulation arc per reticulation, ensuring we get no «dummy leaf», to get a tree T with n leaves summarize T into a rooted binary tree T with n leaves... and 2n-2 arcs We can prove (technical) that each path contains at most 2 reticulation vertices T Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network Decompose N into 2n-2 paths remove one reticulation arc per reticulation, ensuring we get no «dummy leaf», to get a tree T with n leaves summarize T into a rooted binary tree T with n leaves... and 2n-2 arcs We can prove (technical) that each path contains at most 2 reticulation vertices T T Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network Decompose N into 2n-2 paths remove one reticulation arc per reticulation, ensuring we get no «dummy leaf», to get a tree T with n leaves summarize T into a rooted binary tree T with n leaves... and 2n-2 arcs We can prove (technical) that each path contains at most 2 reticulation vertices N contains at most 4(n-1) reticulation vertices Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network «Dummy leaves»? Deleting reticulation arcs can create «dummy leaves» N a b c d Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network «Dummy leaves»? Deleting reticulation arcs can create «dummy leaves» N N a b c d a b c d Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network Possible to avoid creating «dummy leaves»? Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network Possible to avoid creating «dummy leaves»? Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network Possible to avoid creating «dummy leaves»? Given N t1 t2 t3 r1 r2 a b c d Build G(N), bipartite graph such that X = reticulation vertices of N all vertices in X have degree 2 Y = tree vertices of N with at least one reticulation child all vertices in Y have degree 1 or 2 edge between x and y iff x is a child of y X Y r1 r2 G(N) t1 t2 t3 Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a reticulation-visible network Possible to avoid creating «dummy leaves»? Given N t1 t2 t3 r1 r2 a b c d Build G(N), bipartite graph such that X = reticulation vertices of N all vertices in X have degree 2 Y = tree vertices of N with at least one reticulation child all vertices in Y have degree 1 or 2 edge between x and y iff x is a child of y X Y r1 r2 matching covering every vertex of X edges to remove from N G(N) t1 t2 t3 Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a nearly-stable network Reduce nearly-stable networks to stable networks nearly-stable network reticulation-visible network Claim #unstablereticulations 2 #stablereticulations Proof Each unstable reticulation must have a stable reticulation child. At most two unstable reticulations may have the same reticulation child. Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Number of reticulations of a nearly-stable network Reduce nearly-stable networks to stable networks #unstablereticulations 2 #stablereticulations #stablereticulations 4(n-1) #reticulations 12(n-1) Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Deleting reticulation arcs to simplify the question Simplify N by removing an arc near the end of a longest path P. Case analysis (8 cases) Algorithm for nearly-stable networks Repeat - Compute a longest path P in N - Simplify N by considering the subnetwork at the end of P according to the 8 cases above (arc removal + contraction) - Replace a «cherry» (2 leaves + their common parent) appearing in both N and T by a new leaf Finally check that the obtained network is identical to T Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Deleting reticulation arcs to simplify the question Simplify N by removing an arc near the end of a longest path P. Case analysis (8 cases) Algorithm for nearly-stable networks Repeat - Compute a longest path P in N - Simplify N by considering the subnetwork at the end of P according to the 8 cases above (arc removal + contraction) - Replace a «cherry» (2 leaves + their common parent) appearing in both N and T by a new leaf Finally check that the obtained network is identical to T O(n) times O(n) O(1) O(1) O(n) Gambette, Gunawan, Labarre, Vialette & Zhang, RECOMB 2015

Improved algorithms O(n log n)-time algorithm for nearly-stable networks Fakcharoenphol, Kumpijit & Putwattana, JCSSE 2015 Cubic-time algorithm for reticulation-visible networks Gunawan, DasGupta & Zhang, RECOMB 2016 Bordewich & Semple, Advances in Applied Mathematics, 2016 #reticulations 3(n-1) for reticulation-visible networks Bordewich & Semple, Advances in Applied Mathematics, 2016

Perspectives... Provide a practical algorithmic toolbox using network classes http//phylnet.info/tools/ Combine the combinatorial and statistical approaches to reconstruct phylogenetic networks CNRS PICS project with L. van Iersel, S. Kelk, F. Pardi & C. Scornavacca Develop interactions with related fields host/parasite relationships population genetics stemmatology