A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed

Similar documents
THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

On Controllability and Normality of Discrete Event. Dynamical Systems. Ratnesh Kumar Vijay Garg Steven I. Marcus

Tree-width and algorithms

Restricted b-matchings in degree-bounded graphs

Lecture 14 - P v.s. NP 1

On Languages with Very High Information Content

Structural Grobner Basis. Bernd Sturmfels and Markus Wiegelmann TR May Department of Mathematics, UC Berkeley.

Lecture 15 - NP Completeness 1

Lecture 9 : PPAD and the Complexity of Equilibrium Computation. 1 Complexity Class PPAD. 1.1 What does PPAD mean?

A few logs suce to build (almost) all trees: Part II

Properties of normal phylogenetic networks

Phylogenetic Networks, Trees, and Clusters

IBM Almaden Research Center, 650 Harry Road, School of Mathematical Sciences, Tel Aviv University, TelAviv, Israel

Dynamic Programming on Trees. Example: Independent Set on T = (V, E) rooted at r V.

ON THE COMPLEXITY OF SOLVING THE GENERALIZED SET PACKING PROBLEM APPROXIMATELY. Nimrod Megiddoy

Constructing brambles

Computing branchwidth via efficient triangulations and blocks

should be presented and explained in the combined species tree (Fitch, 1970; Goodman et al., 1979). The gene divergence can be the results of either s

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

4.1 Notation and probability review

The subject of this paper is nding small sample spaces for joint distributions of

Haplotyping as Perfect Phylogeny: A direct approach

The Multi-State Perfect Phylogeny Problem with Missing and Removable Data: Solutions via Integer-Programming and Chordal Graph Theory

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

arxiv: v1 [cs.dm] 29 Oct 2012

Lecture 8 - Algebraic Methods for Matching 1

Parametrizing Above Guaranteed Values: MaxSat. and MaxCut. Meena Mahajan and Venkatesh Raman. The Institute of Mathematical Sciences,

A Linear-Time Algorithm for the Terminal Path Cover Problem in Cographs

Tree sets. Reinhard Diestel

Cographs; chordal graphs and tree decompositions

A version of for which ZFC can not predict a single bit Robert M. Solovay May 16, Introduction In [2], Chaitin introd

1 Introduction Given a connected undirected graph G = (V E), the maximum leaf spanning tree problem is to nd a spanning tree of G with the maximum num

The Inclusion Exclusion Principle and Its More General Version

Lecture 24 : Even more reductions

Contributions to computational phylogenetics and algorithmic self-assembly

Polynomial-time Reductions

Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139

Paths and cycles in extended and decomposable digraphs

On improving matchings in trees, via bounded-length augmentations 1

A Phylogenetic Network Construction due to Constrained Recombination

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

Branchwidth of chordal graphs

Bounds for the Zero Forcing Number of Graphs with Large Girth

X X (2) X Pr(X = x θ) (3)

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Strongly chordal and chordal bipartite graphs are sandwich monotone

interval order and all height one orders. Our major contributions are a characterization of tree-visibility orders by an innite family of minimal forb

Limitations of Algorithm Power

On the Complexity of the Reflected Logic of Proofs

Approximate Map Labeling. is in (n log n) Frank Wagner* B 93{18. December Abstract

1 Introduction The j-state General Markov Model of Evolution was proposed by Steel in 1994 [14]. The model is concerned with the evolution of strings

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph

Algorithms and Theory of Computation. Lecture 22: NP-Completeness (2)

Effects of Gap Open and Gap Extension Penalties

Abstract. We show that a proper coloring of the diagram of an interval order I may require 1 +

Precoloring extension on chordal graphs

Branching. Teppo Niinimäki. Helsinki October 14, 2011 Seminar: Exact Exponential Algorithms UNIVERSITY OF HELSINKI Department of Computer Science

Tree-width and planar minors

Three-dimensional Stable Matching Problems. Cheng Ng and Daniel S. Hirschberg. Department of Information and Computer Science

Sequences of height 1 primes in Z[X]

TR : Tableaux for the Logic of Proofs

Chapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9],

Upper and Lower Bounds for Tree-like. Cutting Planes Proofs. Abstract. In this paper we study the complexity of Cutting Planes (CP) refutations, and

Parsimony via Consensus

Efficient Approximation for Restricted Biclique Cover Problems

Extremal problems in logic programming and stable model computation Pawe l Cholewinski and Miros law Truszczynski Computer Science Department Universi

Counting and Constructing Minimal Spanning Trees. Perrin Wright. Department of Mathematics. Florida State University. Tallahassee, FL

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES

Coins with arbitrary weights. Abstract. Given a set of m coins out of a collection of coins of k unknown distinct weights, we wish to

Some operations preserving the existence of kernels

1 Introduction The relation between randomized computations with one-sided error and randomized computations with two-sided error is one of the most i

A.J. Kfoury y. December 20, Abstract. Various restrictions on the terms allowed for substitution give rise

Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies

EXACT DOUBLE DOMINATION IN GRAPHS

Perfect Phylogenetic Networks with Recombination Λ

max bisection max cut becomes max bisection if we require that It has many applications, especially in VLSI layout.

Parameterized Algorithms and Kernels for 3-Hitting Set with Parity Constraints

Vol. 2(1997): nr 4. Tight Lower Bounds on the. Approximability of Some. Peter Jonsson. Linkoping University

Advanced Combinatorial Optimization September 22, Lecture 4

Phylogenetic Networks with Recombination

Pebbling Meets Coloring : Reversible Pebble Game On Trees

MINIMUM DIAMETER COVERING PROBLEMS. May 20, 1997

PATH PROBLEMS IN SKEW-SYMMETRIC GRAPHS ANDREW V. GOLDBERG COMPUTER SCIENCE DEPARTMENT STANFORD UNIVERSITY STANFORD, CA

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

The Complexity of Constructing Evolutionary Trees Using Experiments

and the polynomial-time Turing p reduction from approximate CVP to SVP given in [10], the present authors obtained a n=2-approximation algorithm that

7. F.Balarin and A.Sangiovanni-Vincentelli, A Verication Strategy for Timing-

Propositional Logic. What is discrete math? Tautology, equivalence, and inference. Applications

Using DNA to Solve NP-Complete Problems. Richard J. Lipton y. Princeton University. Princeton, NJ 08540

Fine Structure of 4-Critical Triangle-Free Graphs II. Planar Triangle-Free Graphs with Two Precolored 4-Cycles

Complexity of locally injective homomorphism to the Theta graphs

k-blocks: a connectivity invariant for graphs

On the mean connected induced subgraph order of cographs

This material is based upon work supported under a National Science Foundation graduate fellowship.

Reachability-based matroid-restricted packing of arborescences

Reconstruction of certain phylogenetic networks from their tree-average distances

Transcription:

Computer Science Technical Reports Computer Science 3-17-1994 A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed Richa Agarwala Iowa State University David Fernández-Baca Iowa State University, fernande@iastate.edu Follow this and additional works at: http://lib.dr.iastate.edu/cs_techreports Part of the Theory and Algorithms Commons Recommended Citation Agarwala, Richa and Fernández-Baca, David, "A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed" (1994). Computer Science Technical Reports. 15. http://lib.dr.iastate.edu/cs_techreports/15 This Article is brought to you for free and open access by the Computer Science at Iowa State University Digital Repository. It has been accepted for inclusion in Computer Science Technical Reports by an authorized administrator of Iowa State University Digital Repository. For more information, please contact digirep@iastate.edu.

A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed Abstract We present an algorithm for determining whether a set of species, described by the characters they exhibit, has a perfect phylogeny, assuming the maximum number of characters is fixed. This algorithm is simpler and faster than the known algorithms when the number of characters is at least 4. Disciplines Theory and Algorithms This article is available at Iowa State University Digital Repository: http://lib.dr.iastate.edu/cs_techreports/15

A Faster Algorithm for the Perfect Phylogeny Problem when the number of Characters is Fixed TR94-05 Richa Agarwala and David Fernandez-Baca March 22, 1994 Iowa State University of Science and Technology Department of Computer Science 226 Atanasoff Ames, IA 50011

A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed Richa Agarwala and David Fernandez-Baca Department of Computer Science, Iowa State University, Ames, IA 50011 March 17, 1994 Abstract We present an algorithm for determining whether a set of species, described by the characters they exhibit, has a perfect phylogeny, assuming the maximum number of characters is xed. This algorithm is simpler and faster than the known algorithms when the number of characters is at least 4. 1 Introduction A fundamental problem in biology is that of inferring the evolutionary history of a set of species, each of which is specied by the set of traits or characters that it exhibits [6, 7, 10, 11]. Information about evolutionary history can be conveniently represented by an evolutionary or phylogenetic tree, often referred to simply as a phylogeny. In one of the standard models, the problem can be expressed mathematically as follows. Let C = f1;:::;mg be the character set, and for every c 2C, let A c = f1;:::;r c g be the set of allowable states for character c. We write r to denote max c2c r c.aspecies s =(s 1 ;:::;s m ) isavector such that s 2A 1 A m ; s c is referred to as the state of character c for s, or the state of s on character c. We assume that if i 2A c, then there exists a species s 2S such that s c = i. A phylogeny for S is a tree whose vertices are species and every leaf is in S. Aphylogeny T for S is perfect if for each c 2Cand each state r c of that character, the nodes having state r c on character c form a subtree of T. The perfect phylogeny problem is to determine whether a given set of species S has a perfect phylogeny. IfS admits a perfect phylogeny, the set of characters C is said to be compatible. We should point out that in the biology literature the perfect phylogeny problem is more commonly known as the character compatibility problem [5]. In this context, one is frequently interested in computing a maximal set of compatible characters, since, in practice, character sets tend to be incompatible. The perfect phylogeny problem was shown to be NP-complete by Bodlaender et al. [3] and, independently, by Steel [13]. Linear time algorithms have been found for m = 3 [4, 8]. The perfect phylogeny problem is known to be polynomially equivalent to the problem of triangulating colored graphs [14] which led to a perfect phylogeny algorithm running in O((rm) m+1 + nm 2 ) time [12]. This is polynomial for every xed m but not very fast in practical terms. Bodlaender, Fellows, and Hallett [2] have shown that the 1

perfect phylogeny problem is hard for W [2], implying that it is unlikely to be solved by an algorithm whose running time is of the form O(f(m)rn) where f is an exponential function. In this paper, we present ao((r,n=m) m (rm) 2 ) algorithm, which is polynomial for every xed m and is faster and simpler than the known algorithms for m 4. Using the polynomial equivalence of triangulating colored graphs and perfect phylogeny [14], we get an algorithm with a running time of O(m k+2 k 2 ) for triangulating a k-colored graph having m edges. 2 Preliminaries We now state some denitions and preliminary results from [1]. Lemma 1 A set of species S has a perfect phylogeny if and only if every subset of S has one. Denition 1 Suppose T is a perfect phylogeny for S and let p be some vertex in T. The state of p on character c is forced if p lies on the path between vertices a and b in S such that a c = b c. (Observe that if this is the case, we must have p c = a c = b c.) Denition 2 Suppose G Sand let G 0 = S,G. D(G), the set of distinguishing characters of G, is the set of all c 2Csuch that for every a 2 G and every b 2 G 0, a c 6= b c. M(G), the set of common characters, is C,D(G). Obviously, D(G) =D(G 0 ) and M(G) =M(G 0 ). Denition 3 A pair (G; G 0 ) where G Sand G 0 = S,G is called a split if, for every character, the number of common character states between G and G 0 is at most one. A split (G; G 0 )isac-split if D(G) 6= ;. If (G; G 0 ) is a split (c-split), G and G 0 are called clusters (c-clusters). Observe that we can determine whether a partition (G; G 0 )ofs is a split in O(nm) time. Note also that if G is a cluster but not a c-cluster, then M(G) =C. Denition 4 Let (G; G 0 ) be a split. We say that (G; G 0 )isoftype Iif there exists an s 2 G such that for all c 2M(G), s c equals the unique common state between G and G 0 on character c and jg,fsgj; jg 0 j1. If (G; G 0 )isoftypei,we refer to s as a connecting species. If(G; G 0 ) is not of type I, we say that it is of type II. Lemma 2 If all c-splits are of type II, then, in every perfect phylogeny T of S, every species s 2S isaleaf in T. Denition 5 A cluster G is said to be compatible with a vector s if for every c 2M(G), s c equals the unique common state for character c between G and S,G. Denition 6 A subphylogeny T G for a cluster G is a perfect phylogeny for G[fxg, where x is a node such that G is compatible with x. Node x is referred to as connection of T G. 2

The next result implies that, in searching for a perfect phylogeny for S, we can restrict our attention to perfect phylogenies constructed entirely from subphylogenies. Lemma 3 Suppose S has no type I c-splits. Let G be a cluster. Then, G has a subphylogeny if and only if there exist pairwise disjoint c-clusters G 1 ; ;G k andavector x such that (i) for every c 2M(G), x c equals the (unique) common state for character c between G and S,G, (ii) [ k i=1 G i = G, and (iii) each G i is compatible with x and has a subphylogeny. Corollary 4 Suppose S has no type I c-splits. S has a perfect phylogeny if and only if there exist pairwise disjoint c-clusters G 1 ; ;G k andavector x such that (i) [ k i=1g i = S and (ii) each G i is compatible with x and has a subphylogeny. We assume that there are no duplicate species and for every character, there are at least two character states which are exhibited by at least two species. The latter assumption can be made without loss of generality. To prove this, suppose that there exists a character c 2Csuch that at most one of its states p c is exhibited by more than two species in S. Assume without loss of generality that c = m. We claim that a perfect phylogeny T for S on C exists if and only if a perfect phylogeny T 0 exists for S on C,fcg. Deriving T 0 from T is easy. T can be derived from T 0 as follows. V (T ) = f(u 1 ; ;u m,1 ;p m ):(u 1 ; ;u m,1 ) 2 V (T 0 )g[fs : s 2S;s m 6= p m g E(T ) = f(u; v) :((u 1 ; ;u m,1 ); (v 1 ; ;v m,1 )) 2 E(T 0 )g [f(s; (s 1 ; ;s m,1 ;p m )) : s 2S;s m 6= p m g: The next section describes a way of nding the c-clusters which are compatible with a vector x and can give a perfect phylogeny for S by using Corollary 4. Section 4 gives the basic algorithm for nding a perfect phylogeny for S. The algorithm builds subphylogenies for the c-clusters found using the method presented in Section 3 and eciently searches for pairwise disjoint c-clusters G 1 ; ;G k andavector x satisfying the conditions of Corollary 4. Section 5 improves on the basic algorithm by considering fewer c-clusters. 3 Finding c-clusters Given a species x, Kannan and Warnow [9] dened an equivalence relation E x as the transitive closure of the following relation R on S,fxg: (a; b) 2Rif there exists c 2Csuch that a c = b c 6= x c. It is clear from this denition that two species in S which are in the same equivalence class must be in the same component of T,fxg, for any perfect phylogeny T of S[fxg. The set of equivalence classes is denoted by (S,fxg)=x and can be computed in O(nm) time [9]. Note that if x is an internal node in any perfect phylogeny ons, then j(s, fxg)=xj2. In particular, we make the following remark. Proposition 5 If j(s,fxg)=xj =1for every x 2S, then S has no type I splits. We now reformulate a result in [9]. Lemma 6 Let G 2 (S,fxg)=x. IfS[fxg has a perfect phylogeny, then G is a c-cluster and it is compatible with x. 3

Proof: From the denition of E x, it follows that G is a cluster and G is compatible with x. Wenow show that each G 2 (S,fxg)=x is a c-cluster. The claim is trivially true if G = S. Otherwise, it suces to show that there exists a perfect phylogeny with no duplicate nodes for S[fxg such that the species in each component oft,fxg give us the equivalence classes of (S,fxg)=x. By Lemma 1 there exists a perfect phylogeny for each G [fxg where G S. Specically, let T 1 ; ;T k be perfect phylogenies with no duplicate nodes for G 1 [fxg; ;G k [ fxg, where (S,fxg)=x = fg 1 ; ;G k g. Since each G i is compatible with x, identifying the nodes for x from each T i gives us a perfect phylogeny for S[fxg with no duplicate nodes. To prove the claim, we need to show that x is a leaf in each T i. Suppose this is not true. Let H 1 ; ;H l be the species in components of T i,fxg. As G i is an equivalence class of (S,fxg)=x, there exists a species p 2 H j ; q 2 H j 0 for some j 6= j 0, and a character c 2Csuch that p c = q c 6= x c. But this implies that T i is not a perfect phylogeny, which gives us a contradiction. 2 Corollary 7 Let G 2 (S,fxg)=x. If S[fxg has a perfect phylogeny, then G [fxg has a perfect phylogeny and x is a leaf in every perfect phylogeny of G [fxg. Lemma 8 Suppose S has no type-i c-splits and S[fxg has a perfect phylogeny for some x 62 S. Then S=x is a set of c-clusters satisfying the conditions of Corollary 4. Proof: The result is a consequence of the fact that x 62 S and the following: 1. Each G 2S=x is a c-cluster and is compatible with x. This follows from Lemma 6. 2. The set of c-clusters S=x are pairwise disjoint and their union equals S. This is because E x is an equivalence relation on S. 3. Each G 2S=x has a subphylogeny. To prove this, note that since x is a connection for G, it is sucient to show that G [fxg has a perfect phylogeny. This follows from Lemma 1, since G [fxgs[fxg and S[fxg has a perfect phylogeny. 2 The c-clusters of interest are the ones which can give us a perfect phylogeny for S using Corollary 4 and Lemma 8. As there are Q m i=1 r i choices for x and each S=x can have at most n classes, the total number of c-clusters of interest is O(r m n); all such clusters can be found in O(r m nm) time. 4 The Basic Algorithm The maximum number of edges in any perfect phylogeny with no duplicate nodes is at most (r, 1)m because in a perfect phylogeny, nodes having the same character state on any character form a subtree. This gives us the following necessary but insucient condition for deciding existence of a perfect phylogeny for S. Proposition 9 If n>(r, 1)m +1, then S has no perfect phylogeny. We now present an algorithm which constructs a perfect phylogeny, if it exists, such that the adjacent nodes dier in exactly one character state. The steps carried out by the basic algorithm are as follows. 4

Step 0. If n>(r, 1)m + 1, then Return FAILURE. Step 1. Find if there exists a species x 2Ssuch that j(s,fxg)=xj2. If there is any such species, we get subproblems G [fxg for each G 2 (S,fxg)=x. By Lemma 1, S has a perfect phylogeny if and only if each of these subproblems has one. The details of this construction are given in [1]. We can now assume that for each species x 2S, j(s,fxg)=xj =1. From Proposition 5, S has no type I c-splits. Hence, a species x is a candidate for an internal node in any perfect phylogeny for S only if x 62 S and j(s,fxg)=xj2. Step 2. For each species x 62 S, nd S=x. Create a directed search graph W such that V (W ) = f[s; x] :x 62 S; js=xj2g[f[g; x] :x 62 S; js=xj2;g2s=xg E(W ) = f([g; x]; [S; x]):[g; x] 2 V (W )g[f([g 1 ; x 1 ]; [G 2 ; x 2 ]) : G 1 G 2 and x 1 diers from x 2 in exactly one character stateg Each node of W will have an associated boolean variable, initially FALSE. Step 3. Assign the value TRUE to every [G; x] 2 V (W ) such that jgj =1. Anodew =[G; x] with jgj 2 is said to be active if every w 0 2 V (W ) such that (w 0 ;w) 2 E(W ) has a truth value assigned to it. Do the following until every active node has a truth value. 1. Choose any active nodew that has no truth value assigned to it. 2. Make w TRUE if there exists a vector x such that [G 0 ; x] istrue for every [G 0 ; x] such that ([G 0 ; x];w) 2 E(W ). Otherwise, make w FALSE. Step 4. If there exists a node [S; x] which is TRUE, return SUCCESS. Otherwise, return FAILURE. All the steps can be done in O(r m+1 nm) time as the maximum number of edges entering a node is O(rnm) and the maximum number of outgoing edges is O(rm). Lemma 10 If there exists a perfect phylogeny for S[fxg then [G; x] is assigned TRUE for each G 2S=x. Proof: The proof is by induction on jgj. The base case for nodes [G; x] where jgj =1 is true from Step 3 and the fact that G is compatible with x. Suppose the claim holds for all nodes [G; x] where jgj <k. Consider a node [G; x] where jgj = k. Let T be a perfect phylogeny for G [fxg. By Corollary 7, x is a leaf in T. Let y be the rst node on the path starting at x in T with degree at least 3. There exists at least one such node as there are no type I splits and jgj 2. It is now easy to see that there exist pairwise disjoint clusters in S=y whose union gives G, each of which has a subphylogeny. By induction hypothesis, the corresponding nodes of W will be assigned TRUE. Thus, [G; y] will be TRUE and this value will be propagated to every node [G; p] such that p is on the path between y and x in T. Hence, [G; x] will eventually become TRUE. 2 Theorem 11 S hasaperfect phylogeny if and only if there exists a node [S; x] for some x which is TRUE. 5

Proof: Follows from Lemma 8, Lemma 10, and Steps 2 and 3 of the basic algorithm. 2 Constructing the perfect phylogeny: If the algorithm returns SUCCESS, we can build a perfect phylogeny by traversing W and constructing an in-tree D as follows. Choose any node[s; a] having value TRUE as the root of D. Next, do the following until every leaf [G; x] ind has jgj =1. Pick any leaf w =[G; x] ind such that jgj2. Let A w be any set of node such that (i) A w contains all w 0 =[G 0 ; y] 2 V (W ) such that (w 0 ;w) 2 E(W ), for some y 62 S and (ii) for every w 0 2 A w ;w 0 has been assigned the value TRUE. (Intuitively, A w is a set of nodes that led to w being assigned the value TRUE.) Add to D all the nodes in A w,as well as the edges (w 0 ;w) such that w 0 2 A w. After D has been constructed, we can build a perfect phylogeny T by disregarding the orientation of the edges of D and replacing each node[g; x] ofd with a node labeled x. 5 An Improved Algorithm The following lemma gives a way of reducing the number of internal nodes that need to be considered for building a perfect phylogeny. Lemma 12 Suppose there exists a species s 2Sand a character c 2Csuch that for every other s 0 2S;s c 6= s 0 c; i.e., s c is unique to species s. If there exists a perfect phylogeny for S, then there exists a perfect phylogeny for S such that no node except s has state s c for character c. Proof: State s c is never forced for any nodex 6= s, so a dierent assignment is possible for x c. Hence, as long as least one character state for character c is exhibited by more than one species, we can obtain a perfect phylogeny for S such that no node except s has state s c for character c. 2 If S has no type I splits, it is easy to see that for every species s 2S, there exists a character c 2Csuch that for every other s 0 2S;s c 6= s 0 c.for each s 2S, pick one such character. Let k c be the number of times character c is chosen in this process. From the above lemma, the number of internal nodes that need to be considered is Q m (r,k i=1 i) and in Step 2 of the algorithm in Section 4, we only need to consider the c-clusters derived from these internal nodes. As P m i=1 k i = n and Q m i=1(r, k i ) (r, P m i=1 k i =m) m, the running time of this improved algorithm is O((r, n=m) m (rnm)). Since n (r, 1)m +1 by Proposition 9, the running time of the new algorithm is O((r, n=m) m (rm) 2 ). Acknowledgements We thank Tandy Warnow and Sampath Kannan for providing us with the equivalence relation E x and their results in [9]. 6

References [1] R. Agarwala and D. Fernandez-Baca, A Polynomial-Time Algorithm for the Perfect Phylogeny Problem when the Number of Character States is Fixed, In Proceedings of the 34th Annual Symposium on Fundamentals of Computer Science (1993), pp. 140{147. [2] H. Bodlaender, M. Fellows, and M. T. Hallett, The parameterized complexity of DNA mapping, perfect phylogeny and other problems of \bounded width", manuscript. [3] H. Bodlaender, M. Fellows, and T. Warnow, Two strikes against perfect phylogeny, In Proceedings of the 19th International Colloquium on Automata, Languages, and Programming, Springer Verlag, Lecture Notes in Computer Science (1992), pp. 273{283. [4] H. Bodlaender and T. Kloks, A simple linear time algorithm for triangulating three-colored graphs, In Proceedings of the 9th Annual Symposium on Theoretical Aspects of Computer Science (1992), pp. 415{423. [5] W. H. E. Day and D. Sankoff, Computational complexity of inferring phylogenies by compatibility, Syst. Zool., Vol. 35, No. 2 (1986), pp. 224{229. [6] G. F. Estabrook, Cladistic methodology: A discussion of the theoretical basis for the induction of evolutionary history, Annual Review of Ecology and Systematics, 3 (1972), pp. 427{456. [7] W. M. Fitch, Aspects of molecular evolution, Annual Reviews of Genetics, 7 (1973), pp. 343{380. [8] S. Kannan and T. Warnow, Triangulating three-colored graphs, SIAM J. on Discrete Mathematics, 5 (1992), pp. 249{258. [9] S. Kannan and T. Warnow, Personal Communication. [10] W. J. Le Quesne, A method of selection of characters in numerical taxonomy, Syst. Zool., 18 (1969), pp. 201{205. [11] W. J. Le Quesne, Further studies based on the uniquely derived character concept, Syst. Zool., 21 (1972), pp. 281{288. [12] F. R. McMorris, T. Warnow, and T. Wimer, Triangulating vertex colored graphs, In Proceedings of the 4th Annual Symposium on Discrete Algorithms (1993), Austin, Texas. [13] M. A. Steel, The complexity of reconstructing trees from qualitative characters and subtrees, Journal of Classication, 9 (1992), pp. 91{116. [14] T. J. Warnow, Combinatorial Algorithms for Constructing Phylogenetic Trees, Ph.D. Thesis, University of California, Berkeley, May 1991. 7

OF IOWA STATE UNIVERSITY SCIENCE DEPARTMENT OF COMPUTER SCIENCE SCIENCE with PRACTICE Tech Report: TR94-05 Submission Date: March 22, 1994 AND TECHNOLOGY