Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

Similar documents
Tree Decompositions and Tree-Width

Binary Decision Diagrams. Graphs. Boolean Functions

Binary Decision Diagrams

Evolutionary Tree Analysis. Overview

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

FPT is Characterized by Useful Obstruction Sets

k-distinct In- and Out-Branchings in Digraphs

On improving matchings in trees, via bounded-length augmentations 1

Fixed-parameter tractable reductions to SAT. Vienna University of Technology

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth

Strong Computational Lower Bounds via Parameterized Complexity

The Complexity of Constructing Evolutionary Trees Using Experiments

Clique is hard on average for regular resolution

Dynamic Programming on Trees. Example: Independent Set on T = (V, E) rooted at r V.

Polynomial kernels for constant-factor approximable problems

CS 372: Computational Geometry Lecture 4 Lower Bounds for Computational Geometry Problems

The maximum agreement subtree problem

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES

Minimal Phylogenetic Supertrees and Local Consensus Trees

Scheduling on Unrelated Parallel Machines. Approximation Algorithms, V. V. Vazirani Book Chapter 17

Clique is hard on average for regular resolution

Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees

Notes 3 : Maximum Parsimony

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore

A Cubic-Vertex Kernel for Flip Consensus Tree

Cleaning Interval Graphs

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

Parameterized Complexity of the Sparsest k-subgraph Problem in Chordal Graphs

A Decidable Class of Planar Linear Hybrid Systems

The NP-hardness of finding a directed acyclic graph for regular resolution

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar

Exploring Treespace. Katherine St. John. Lehman College & the Graduate Center. City University of New York. 20 June 2011

Propositional Calculus - Semantics (3/3) Moonzoo Kim CS Dept. KAIST

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

FIRST-ORDER QUERY EVALUATION ON STRUCTURES OF BOUNDED DEGREE

Improved Polynomial Identity Testing for Read-Once Formulas

Chapter 5 The Witness Reduction Technique

Finding Consensus Strings With Small Length Difference Between Input and Solution Strings

arxiv: v1 [cs.cc] 9 Oct 2014

Efficient Reassembling of Graphs, Part 1: The Linear Case

Dominating Set Counting in Graph Classes

A Phylogenetic Network Construction due to Constrained Recombination

arxiv: v1 [cs.ds] 1 Nov 2018

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

1 Non-deterministic Turing Machine

Enumerating All Maximal Frequent Subtrees

Parameterized Domination in Circle Graphs

Phylogenetic Networks, Trees, and Clusters

Isomorphism for Graphs of Bounded Distance Width 1. Graph isomorphism, Fixed parameter tractability, Distance pathwidth, Distance treewidth.

Read-Once Polynomial Identity Testing

Parsimony via Consensus

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Walks in Phylogenetic Treespace

arxiv: v1 [cs.ds] 20 Feb 2017

CV-width: A New Complexity Parameter for CNFs

Bounded Treewidth Graphs A Survey German Russian Winter School St. Petersburg, Russia

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Key words. computational learning theory, evolutionary trees, PAC-learning, learning of distributions,

arxiv: v1 [cs.ds] 21 May 2013

Learning Partial Lexicographic Preference Trees over Combinatorial Domains

Propositional and Predicate Logic - IV

On the Isomorphism Problem for Decision Trees and Decision Lists

Linear FPT Reductions and Computational Lower Bounds

Enumerating the edge-colourings and total colourings of a regular graph

Fault-Tolerant Consensus

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

Lecture 59 : Instance Compression and Succinct PCP s for NP

Theory of Computation Chapter 9

Partitions versus sets : a case of duality

Distribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Topics in Approximation Algorithms Solution for Homework 3

Lecture #14: NP-Completeness (Chapter 34 Old Edition Chapter 36) Discussion here is from the old edition.

Realization Plans for Extensive Form Games without Perfect Recall

More Completeness, conp, FNP,etc. CS254 Chris Pollett Oct 30, 2006.

Maximum Agreement Subtrees

Lecture 15 - NP Completeness 1

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization

A 3-approximation algorithm for the subtree distance between phylogenies

EXACT DOUBLE DOMINATION IN GRAPHS

Algorithmic aspects of tree amalgamation

Phylogenetic Networks with Recombination

Inferring a level-1 phylogenetic network from a dense set of rooted triplets

On the Subnet Prune and Regraft distance

k-protected VERTICES IN BINARY SEARCH TREES

Intro to Theory of Computation

Physical Mapping. Restriction Mapping. Lecture 12. A physical map of a DNA tells the location of precisely defined sequences along the molecule.

Binary Decision Diagrams

Computing Bounded Path Decompositions in Logspace

Induced subgraphs of graphs with large chromatic number. VI. Banana trees

Monadic Second Order Logic on Graphs with Local Cardinality Constraints

1 Fixed-Parameter Tractability

Single parameter FPT-algorithms for non-trivial games

Chapter 4: Computation tree logic

Basing Decisions on Sentences in Decision Diagrams

More on NP and Reductions

Complexity Theory Part II

Greedy Trees, Caterpillars, and Wiener-Type Graph Invariants

Computational Complexity of Bayesian Networks

Transcription:

Solving the Maximum Agreement Subtree and Maximum Compatible Tree problems on bounded degree trees LIRMM, Montpellier France 4th July 2006

Introduction The Mast and Mct problems: given a set of evolutionary trees, choose a subset of the species on which they agree (for a suitable definition of this notion). Applications: 1. obtain a consensus between several trees; 2. define similarity measures between trees; 3. other applications.

The Mast problem The Mast problem: [FG85] Given a collection T = {T 1,..., T k } of trees on a common label set L, find a tree T with the largest number of leaves, st T is included in every T i. we say that T is an agreement subtree for T. T 1 T 2 T a b c d e f a b c d e f a b e f

The Mct problem The Mct problem: [HS96] Given a collection T = {T 1,..., T k } of trees on a common label set L, find a tree T with the largest number of leaves, st T can be obtained from every T i by removing labels and resolving multifurcations. we say that T is a compatible tree for T. T 1 T 2 T a b c d e f a b c d e f a b c d e f

Bounded degree trees We have the following parameters: k is the number of input trees; n is the number of labels; N = O(kn) is the size of the instance; D is a bound on the maximum degree of the trees. Here we consider the parameterized complexity of the problems wrt D. Important result: Mast can be solved in time O(n D + kn 3 ) ([FPT95, Bry97]).

Questions and results Questions: Mast can be solved in O(n D + kn 3 ) time; but... can we do better, eg 2 D N or N D? what is the complexity of Mct? Results: Mast is W[1]-hard wrt D, and cannot be solved in time Φ(D)N o(d) unless Snp Se; Mct is W[1]-hard wrt D, and cannot be solved in time Φ(D)N o(2d/2) unless Snp Se.

Toolkit The class Fpt : A parameterized problem Π is in Fpt if on every instance (x, k), it can be solved in time f (k) x c for some function f and some constant c. For instance, Mast would be Fpt wrt D if we could solve it in time 2 D N. Goal of parameterized complexity: positive toolkit : techniques to derive FPT algorithms when possible; negative toolkit : techniques to show that a problem is unlikely to be FPT.

Toolkit Definition (fpt-reduction). Let Π, Π be two parameterized problems. We say that Π fpt-reduces to Π iff there are functions f, g, a constant c, and an algorithm A which from every instance (x, k) of Π, produces an instance (x, k ) of Π satisfying: A has running time f (k) x c ; k g(k); (x, k) Π iff (x, k ) Π. An fpt-reduction is linear if g is at most linearly increasing (ie if k = O(k)).

Toolkit The class W[1]. Contains the problems fpt-reducible to IS: given a parameter k, and a graph G, decide if G has an independent of size k? if Π is W[1]-hard, then Π cannot be solved in f (k) x c time unless Fpt = W[1]. The class W l [1]. Contains the problem which are reducible to IS by linear fpt-reductions. If Π is W l [1]-hard, then Π cannot be solved in f (k) x o(k) time unless Snp Se [Hua04].

Reduction for Mast Main result: Proposition. Mast[D] is W l [1]-hard. Consequences: Mast cannot be solved in time Φ(D)N c unless Fpt = W[1]; Mast cannot be solved in time Φ(D)N o(d) unless Snp Se.

Reduction for Mast Let G = (V, E) be a k-partite graph with partition V 1,..., V k. A partitioned independent set of G is a set I st: I is an independent set of G; I V i = 1 for every i. The problem Pis asks: given a k-partite graph without isolated vertices, decide if G has a partitioned independent set. Lemma. Pis[k] is W l [1]-hard.

Reduction for Mast Lemma. There exists a linear fpt-reduction from Pis[k] to Mast[D]. Sketch of the reduction. Let G = (V, E) be a k-partite graph with partition V 1,..., V k. We contruct a collection T of trees with degree k + 1. this ensures the linearity. We set L := V, T = {P} {Q e : e E}.

Reduction for Mast Control component. P the root has R 1,..., R k as child subtrees; P R 1 R 2 R k

Reduction for Mast Control component. P the root has R 1,..., R k as child subtrees; each R i is a rake-tree with label set V i. P R 1 R 2 R k

Reduction for Mast Control component. P the root has R 1,..., R k as child subtrees; each R i is a rake-tree with label set V i. P (P) = k. R 1 R 2 R k

Reduction for Mast Selection component. Q e for each e E let x, y be the two endpoints of e; then x V i, y V j. Q e is obtained from P, by P x y R i R j

Reduction for Mast Selection component. Q e for each e E let x, y be the two endpoints of e; then x V i, y V j ; Q e is obtained from P, by (i) removing x, y, Q e R i R j

Reduction for Mast Selection component. Q e for each e E let x, y be the two endpoints of e; then x V i, y V j ; Q e is obtained from P, by (i) removing x, y, (ii) creating a new node v e as child of the root Q e v e R i R j

Reduction for Mast Selection component. Q e for each e E let x, y be the two endpoints of e; then x V i, y V j ; Q e is obtained from P, by (i) removing x, y, (ii) creating a new node v e as child of the root, (iii) regrafting x, y as children of v e. Q e v e x y R i R j

Reduction for Mast Selection component. Q e for each e E let x, y be the two endpoints of e; then x V i, y V j ; Q e is obtained from P, by (i) removing x, y, (ii) creating a new node v e as child subtree of the root, (iii) regrafting x, y as children of v e. Q e v e x y (Q e ) = k + 1 R i R j

Reduction for Mast Lemma. G is a positive instance of Pis iff Mast(T ) k 1. if G has a partitioned independent set I = {x 1,..., x k } with x i V i, then T = (x 1,..., x k ) is an agreement subtree for T ; 2. if T has an agreement subtree T of size k, then: Control: T has the form (x 1,..., x k ) with x i V i ; Selection: there is no edge joining x i, x j in G. and thus I = {x 1,..., x k } is a partitioned independent set of G.

Reduction for Mct Main result: Proposition. Mct[2 D/2 ] is W l [1]-hard. Consequences: Mct cannot be solved in Φ(D)N c time unless Fpt = W[1]; Mct cannot be solved in Φ(D)N o(2d/2) time unless Snp Se.

Reduction for Mct Let G = (V, E) be a k-partite graph with partition V 1,..., V k. A 2-partitioned independent set of G is a set I st: I is an independent set of G; I V i = 2 for every i. The problem Pis-2 asks: given a k-partite graph G, does G have a 2-partitioned independent set? Lemma. The problem Pis-2[k] is W l [1]-hard.

Reduction for Mct Lemma. There exists a linear fpt-reduction from Pis-2[k] to Mct[2 D/2 ]. Sketch of the reduction. Let G = (V, E) be a k-partite graph with partition V 1,..., V k. We construct a collection T of trees with degree 2 log 2 k + 1. this ensures the linearity since with D 2 log 2 k, we have 2 D/2 = O(k). We set L := V, T = {P, P } {Q e : e E}.

Reduction for Mct Control component: P B binary tree, with leaves 1,..., k and height log 2 k ; P B log 2 k 1 2 k

Reduction for Mct Control component: P B binary tree, with leaves 1,..., k and height log 2 k ; on leaf i of B, we graft a rake-tree R i with label set V i, ie P = B[R 1,..., R k ]. P B log 2 k R 1 R 2 R k

Reduction for Mct Control component: P B binary tree, with leaves 1,..., k and height log 2 k ; on leaf i of B, we graft a rake-tree R i with label set V i, ie P = B[R 1,..., R k ]. P B log 2 k (P) = 2. R 1 R 2 R k

Reduction for Mct Control component: P let R i be the symmetric rake-tree of R i ; on leaf i of B, we graft the rake-tree R i, ie P = B[R 1,..., R k ]. P B log 2 k R 1 R 2 R k

Reduction for Mct Control component: P let R i be the symmetric rake-tree of R i ; on leaf i of B, we graft the rake-tree R i, ie P = B[R 1,..., R k ]. P B log 2 k (P ) = 2. R 1 R 2 R k

Reduction for Mct Selection component: Q e for each e E let x, y be the two endpoints of e; then x V i, y V j. Let λ i,j the least common ancestor of leaves i, j in B. Q e is obtained from P by P λ i,j x y R i R j

Reduction for Mct Selection component: Q e for each e E let x, y be the two endpoints of e; then x V i, y V j. Let λ i,j the least common ancestor of leaves i, j in B. Q e is obtained from P by (i) collapsing the edges joining i, j in P, Q e λ i,j x y R i R j

Reduction for Mct Selection component: Q e for each e E let x, y be the two endpoints of e; then x V i, y V j. Let λ i,j the least common ancestor of leaves i, j in B. Q e is obtained from P by (i) collapsing the edges joining i, j in P, (ii) removing x, y, Q e λ i,j R i R j

Reduction for Mct Selection component: Q e for each e E let x, y be the two endpoints of e; then x V i, y V j. Let λ i,j the least common ancestor of leaves i, j in B. Q e is obtained from P by (i) collapsing the edges joining i, j in P, (ii) removing x, y, (iii) creating a new node v e as child of λ i,j Q e λ i,j v e R i R j

Reduction for Mct Selection component: Q e for each e E let x, y be the two endpoints of e; then x V i, y V j. Let λ i,j the least common ancestor of leaves i, j in B. Q e is obtained from P by (i) collapsing the edges joining i, j in P, (ii) removing x, y, (iii) creating a new node v e as child of λ i,j, (iv) regrafting x, y as children of v e. Q e λ i,j v e x y R i R j

Reduction for Mct Selection component: Q e for each e E let x, y be the two endpoints of e; then x V i, y V j. Let λ i,j the least common ancestor of leaves i, j in B. Q e is obtained from P by (i) collapsing the edges joining i, j in P, (ii) removing x, y, (iii) creating a new node v e as child of λ i,j, (iv) regrafting x, y as children of v e. Q e λ i,j v e x y (Q e ) 2 log 2 k + 1. R i R j

Reduction for Mct Lemma. G positive instance of Pis-2 iff Mct(T ) 2k 1. if G has a 2-partitioned independent set I = {x1 1, x1 2,..., xk 1, x k 2} where x i 1, x i 2 V i, then T = B[(x1 1, x1 2 ),...(xk 1, x k 2 )] is a compatible tree for T ; 2. if T has a compatible tree T of size 2k, then: Control: T has the form B[(x1 1, x1 2 ),..., (xk 1, x k 2 )] with xi 1, x i 2 V i ; Selection: there is no edge joining vertices x p i, x q j in G. and thus I = {x1 1, x1 2,..., xk 1, x k 2 } is a 2-partitioned independent set of G.

Other results Other results on the parameterized complexity of Mast and Mct [GN06]: an algorithm solving Mct in time O(n 2d+2 + kn 3 ); algorithms solving Mast and Mct in time 2 2kd n 3 ; more results.

D. Bryant. Building trees, hunting for trees and comparing trees: theory and method in phylogenetic analysis. PhD thesis, University of Canterbury, Department of Mathemathics, 1997. C. R. Finden and A. D. Gordon. Obtaining common pruned trees. Journal of Classification, 2:255 276, 1985. M. Farach, T. M. Przytycka, and M. Thorup. On the agreement of many trees. Information Processing Letters, 55(6):297 301, 1995. S. Guillemot and F. Nicolas. Parameterized complexity of the maximum agreement subtree and maximum compatible tree problems, to appear. 2006. A. M. Hamel and M. A. Steel. Finding a maximum compatible tree is NP-hard for sequences and trees. Applied Mathematics Letters, 9(2):55 59, 1996. X. Huang. Parameterized complexity and polynomial-time approximation schemes. PhD thesis, Texas AM University, Department of Computer Science, 2004.