Reconstructing Phylogenetic Networks

Similar documents
arxiv: v3 [q-bio.pe] 1 May 2014

Downloaded 10/29/15 to Redistribution subject to SIAM license or copyright; see

Improved maximum parsimony models for phylogenetic networks

arxiv: v5 [q-bio.pe] 24 Oct 2016

arxiv: v1 [q-bio.pe] 12 Jul 2017

arxiv: v1 [cs.cc] 9 Oct 2014

A new algorithm to construct phylogenetic networks from trees

An introduction to phylogenetic networks

Reconstruction of certain phylogenetic networks from their tree-average distances

Phylogenetic Networks with Recombination

Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies

arxiv: v1 [cs.ds] 2 Sep 2016

Parsimonious Migration History Problem: Complexity and Algorithms. Mohammed El-Kebir, University of Illinois at Urbana-Champaign WABI 2018

arxiv: v1 [q-bio.pe] 1 Jun 2014

Phylogenetic Networks, Trees, and Clusters

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

Properties of normal phylogenetic networks

Tree-average distances on certain phylogenetic networks have their weights uniquely determined

LEO VAN IERSEL TU DELFT

Phylogenetics: Parsimony

Branching. Teppo Niinimäki. Helsinki October 14, 2011 Seminar: Exact Exponential Algorithms UNIVERSITY OF HELSINKI Department of Computer Science

A Collection of Problems in Propositional Logic

arxiv: v1 [cs.dm] 29 Oct 2012

Evolutionary Tree Analysis. Overview

Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic).

Boolean decision diagrams and SAT-based representations

On the Space Complexity of Parameterized Problems

Restricted trees: simplifying networks with bottlenecks

From graph classes to phylogenetic networks Philippe Gambette

Integer Programming for Phylogenetic Network Problems

Limitations of Algorithm Power

The Spanning Galaxy Problem

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

Acyclic Digraphs arising from Complete Intersections

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

Monadic Second Order Logic on Graphs with Local Cardinality Constraints

arxiv: v2 [cs.dm] 12 Jul 2014

X X (2) X Pr(X = x θ) (3)

Finding a gene tree in a phylogenetic network Philippe Gambette

Lower bounds for polynomial kernelization

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

Lecture 24 : Even more reductions

Problem Set 3 Due: Wednesday, October 22nd, 2014

Solving the Tree Containment Problem for Genetically Stable Networks in Quadratic Time

Graph. Supply Vertices and Demand Vertices. Supply Vertices. Demand Vertices

Theory of Computation Time Complexity

Detecting Backdoor Sets with Respect to Horn and Binary Clauses

Comp487/587 - Boolean Formulas

CS21 Decidability and Tractability

arxiv: v1 [cs.ds] 1 Nov 2018

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES

On the Subnet Prune and Regraft distance

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization

More on NP and Reductions

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

NP Completeness. CS 374: Algorithms & Models of Computation, Spring Lecture 23. November 19, 2015

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle

Regular networks are determined by their trees

Lecture 15 - NP Completeness 1

The complexity of enumeration and counting for acyclic conjunctive queries

Approximating Longest Directed Path

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

Inferring a level-1 phylogenetic network from a dense set of rooted triplets

Reconstructing Phylogenetic Networks Using Maximum Parsimony

The maximum edge-disjoint paths problem in complete graphs

lamsade 331 Décembre 2012 Laboratoire d Analyses et Modélisation de Systèmes pour l Aide à la Décision UMR 7243

On the Computational Hardness of Graph Coloring

ACYCLIC DIGRAPHS GIVING RISE TO COMPLETE INTERSECTIONS

The complexity of acyclic subhypergraph problems

1 Review of Vertex Cover

REPORT MAS-R0404 DECEMBER

Shorelines of islands of tractability: Algorithms for parsimony and minimum perfect phylogeny haplotyping problems

Lecture 22: Counting

HW Graph Theory SOLUTIONS (hbovik) - Q

5. Complexity Theory

Efficient Parsimony-based Methods for Phylogenetic Network Reconstruction

The Complexity of Mining Maximal Frequent Subgraphs

On vertex coloring without monochromatic triangles

CS 161: Design and Analysis of Algorithms

CS154, Lecture 15: Cook-Levin Theorem SAT, 3SAT

Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem

HARDNESS AND ALGORITHMS FOR RAINBOW CONNECTIVITY

Computational Complexity

Partitioning Metric Spaces

Bayesian Networks Factor Graphs the Case-Factor Algorithm and the Junction Tree Algorithm

k-distinct In- and Out-Branchings in Digraphs

DRAFT. Algebraic computation models. Chapter 14

Institute of Operating Systems and Computer Networks Algorithms Group. Network Algorithms. Tutorial 3: Shortest paths and other stuff

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

Enumerating the edge-colourings and total colourings of a regular graph

Branchwidth of graphic matroids.

Phylogenetic Tree Reconstruction

I. Introduction to NP Functions and Local Search

Classes of Problems. CS 461, Lecture 23. NP-Hard. Today s Outline. We can characterize many problems into three classes:

Beyond Galled Trees Decomposition and Computation of Galled Networks

Learning Large-Alphabet and Analog Circuits with Value Injection Queries

Digital search trees JASS

Acyclic orientations of graphs

Transcription:

Reconstructing Phylogenetic Networks Mareike Fischer, Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz, Celine Scornavacca, Leen Stougie Centrum Wiskunde & Informatica (CWI) Amsterdam MCW Prague, 3 July 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44

Definition Let X be a finite set. A (rooted) phylogenetic tree on X is a rooted tree with no indegree- outdegree- vertices whose leaves are bijectively labelled by the elements of X. 8 Mya 76 Mya 68 Mya 35 Mya Kiwi (New Zealand) Cassowary (New Guinea + Australia) Leo van Iersel (CWI) Emu (Australia) Ostrich Moa (Africa) (New Zealand) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44

Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44

Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44

Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 5 / 44

W.F. Doolittle et al. () Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 6 / 44

Definition Let X be a finite set. A (rooted) phylogenetic network on X is a rooted directed acyclic graph with no indegree- outdegree- vertices whose leaves are bijectively labelled by the elements of X. Parasitic Jaeger Pomarine Skua Great Skua Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 7 / 44

The first phylogenetic network (Buffon, 755) Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 8 / 44

Phylogenetic network for humans (Reich et al., ) Definition A reticulation is a vertex with indegree at least. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 9 / 44

Tree-based methods Compute trees from DNA sequences. Different parts of DNA might give different trees. Try to induce a phylogenetic network from the trees. Definition A phylogenetic tree T is displayed by a phylogenetic network N if T can be obtained from a subgraph of N by contracting edges. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44

Example: tree T is displayed by network N a h a h g g b c d e f N b c d e f T Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44

The other binary tree T displayed by network N a h a h g g b c d e f b c d e f N T Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44

Challenge: try to reconstruct the network from the trees a h g a h b c d e f g a b c d e f h g b c d e f Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44

Definition The reticulation number of a phylogenetic network N is d (v). Problem Minimum Reticulation v V \{root} Instance: phylogenetic trees T, T Solution: phylogenetic network that displays T and T Minimize: reticulation number of the network. Theorem There exists a constant factor approximation algorithm for Minimum Reticulation if and only if there exists a constant factor approximation algorithm for Directed Feedback Vertex Set. Open question: how to handle more than two trees (efficiently)? Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44

Reconstructing phylogenetic networks Tree-based methods Construct trees from DNA sequences. Find a network that displays the trees and has minimum reticulation number. Sequence-based methods Find a network directly from the DNA sequences. Optimize Parsimony or Likelihood score of network. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 5 / 44

Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the internal vertices in order to minimize the total number of mutations. ACCTG ATCTG ATCTC GTAAA TTACT Example input Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 6 / 44

Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the internal vertices in order to minimize the total number of mutations. TTCTA ATCTA ATCTG TTAAA ACCTG ATCTG ATCTC GTAAA TTACT Example labelling of internal vertices Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 7 / 44

Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the internal vertices in order to minimize the total number of mutations. ATCTA TTCTA ATCTG TTAAA ACCTG ATCTG ATCTC GTAAA TTACT Example of one mutation Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 8 / 44

Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the internal vertices in order to minimize the total number of mutations. TTCTA ATCTA ATCTG TTAAA ACCTG ATCTG ATCTC GTAAA TTACT All 9 mutations. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 9 / 44

Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the interior vertices in order to minimize the total number of mutations. Polynomial-time solvable: Consider each character separately. Use dynamic programming (Fitch, 97). Two possible extensions to networks: hardwired softwired Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44

Hardwired Maximum Parsimony on Networks A p-state character on X is a function α : X {,..., p}. The change c τ (e) on edge e = (u, v) w.r.t. a p-state character τ on V (N) is defined as: { if τ(u) = τ(v) c τ (e) = if τ(u) τ(v). The hardwired parsimony score of a phylogenetic network N and p-state character α is given by PS hw (N, α) = min c τ (e), τ e E(N) where the minimum is taken over all p-state characters τ on V (N) that extend α. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44

Example input: (N, α) 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44

A 3-state character τ on V (N) that extends α. 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44

PS hw (N, α) = 4 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44

Softwired Maximum Parsimony on Networks The softwired parsimony score of a phylogenetic network N and p-state character α is given by PS sw (N, α) = min PS(T, α), T T (N) where T (N) is the set of trees on X displayed by N. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 5 / 44

One of the two trees on X displayed by the network 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 6 / 44

A 3-state character τ on V (T ) that extends α. 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 7 / 44

There are 3 changes 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 8 / 44

The other tree needs 4 changes 3 The minimum over the two trees is 3, so PS sw (N, α) = 3. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 9 / 44

Proposition PS hw (N, α) is not an o(n)-approximation of PS sw (N, α). Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44

Softwired Parsimony Score PS sw (N, α) = Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44

Hardwired Parsimony Score with r the number of reticulations. PS hw (N, α) = 4 = r + Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44

Proposition Let G be the graph obtained from network N by merging all leaves x with α(x) = i into a single node γ i, for i =,..., p. Then, PS hw (N, α) equals the size of a minimum multiterminal cut in G with terminals γ,..., γ p. Corollary Computing the hardwired parsimony score of a phylogenetic network and a binary character is polynomial-time solvable. Corollary Computing the hardwired parsimony score of a phylogenetic network and a p-state character, for p 3, is NP-hard and APX-hard but fixed-parameter tractable (FPT) in the parsimony score, and there exists a polynomial-time.3438-approximation for all p and a -approximation for p = 3. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 33 / 44

Example Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 34 / 44

Merge -leaves and -leaves Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 35 / 44

Hardwired Parsimony Score is 4 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 36 / 44

Observation There exists a (trivial) X -approximation for computing the softwired parsimony score of a phylogenetic network. Theorem For every constant ɛ > there is no polynomial-time approximation algorithm that approximates PS sw (N, α) to a factor X ɛ, for a phylogenetic network N and a binary character α, unless P = NP. Definition A phylogenetic network is binary if the root has outdegree and all other vertices have total degree or 3. Theorem For every constant ɛ > there is no polynomial-time approximation algorithm that approximates PS sw (N, α) to a factor X 3 ɛ, for a binary phylogenetic network N and a binary character α, unless P = NP. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 37 / 44

Proof: reduction from 3SAT Ú s s variable gadget x :x y :y z :z.................. f(n,ï) copies f(n,ï) copies clause gadget... f(n,ï)... copies x _ :y :x _ y _ z f(n,ï) copies Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 38 / 44

Binary case x :x y :y z :z dx Îx dy Îy dz Îz dx Îz dx 3 x _ y x _ :y _ z x _ :z :x _ :z Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 39 / 44

Theorem There is no FPT algorithm for computing the softwired parsimony score, with the score as parameter, unless P = NP. Definition A phylogenetic network is level-k if each biconnected component has reticulation number at most k. Theorem There is an FPT algorithm for computing the softwired parsimony score, with the level of the network as parameter. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44

ILP for softwired parsimony score min c e e E s.t. x v,s = s P c e x u,s x v,s ( y e ) c e x v,s x u,s ( y e ) y (v,r) = v:(v,r) E y e = x v,α(v) = c e, y e {, } x v,s {, } for all v V for all e = (u, v) E, s P for all e = (u, v) E, s P for each reticulation r for each non-reticulate edge e for each leaf v for all e E for all v V, s P with P = {,..., p} and α(v) the given character state of a leaf v. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44

Both parsimony scores can be computed quickly using ILP X Avg. Average computation time (s) num. of Hardwired PS Softwired PS retic. -state 3-state 4-state -state 3-state 4-state 5 7.......3 37.......6 5 54....6...8 7.8.....4.4 5 9.3.. 3.5..4. 3.6.. 5...6 3.7 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44

Future Work Are there approximation or FPT algorithms for computing the softwired parsimony score of restricted classes of networks? How to search for an optimal network? What if the different characters are not independent? Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 43 / 44

Thanks Mareike Fischer (Greifswald) Steven Kelk (Maastricht) Celine Scornavacca (Montpellier) Simone Linz (Christchurch) Leen Stougie (Amsterdam) Nela Lekić (Maastricht) Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 44 / 44