COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE

Similar documents
COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE

Computing the Quartet Distance between Evolutionary Trees in Time O(n log n)

CS 491G Combinatorial Optimization Lecture Notes

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

CIT 596 Theory of Computation 1. Graphs and Digraphs

2.4 Theoretical Foundations

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

Lecture 6: Coding theory

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Factorising FACTORISING.

A Disambiguation Algorithm for Finite Automata and Functional Transducers

arxiv: v1 [cs.dm] 24 Jul 2017

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

Welcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

Lecture 8: Abstract Algebra

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Lecture 2: Cayley Graphs

Particle Physics. Michaelmas Term 2011 Prof Mark Thomson. Handout 3 : Interaction by Particle Exchange and QED. Recap

On the Spectra of Bipartite Directed Subgraphs of K 4

I 3 2 = I I 4 = 2A

Monochromatic Plane Matchings in Bicolored Point Set

Lecture 11 Binary Decision Diagrams (BDDs)

arxiv: v2 [math.co] 31 Oct 2016

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005

Lecture 4: Graph Theory and the Four-Color Theorem

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Lecture Notes No. 10

Eigenvectors and Eigenvalues

Now we must transform the original model so we can use the new parameters. = S max. Recruits

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

Arrow s Impossibility Theorem

Subsequence Automata with Default Transitions

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

CS 360 Exam 2 Fall 2014 Name

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching

Graph Theory. Simple Graph G = (V, E). V={a,b,c,d,e,f,g,h,k} E={(a,b),(a,g),( a,h),(a,k),(b,c),(b,k),...,(h,k)}

Section 2.1 Special Right Triangles

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Identifying and Classifying 2-D Shapes

Finite State Automata and Determinisation

The vertex leafage of chordal graphs

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Compression of Palindromes and Regularity.

SEMI-EXCIRCLE OF QUADRILATERAL

On a Class of Planar Graphs with Straight-Line Grid Drawings on Linear Area

Linear Algebra Introduction

Computing all-terminal reliability of stochastic networks with Binary Decision Diagrams

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Discrete Structures Lecture 11

Arrow s Impossibility Theorem

NON-DETERMINISTIC FSA

QUADRATIC EQUATION. Contents

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P.

On the existence of a cherry-picking sequence

Analysis of Temporal Interactions with Link Streams and Stream Graphs

Introduction to Olympiad Inequalities

Section 4.4. Green s Theorem

XML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

Boolean Algebra cont. The digital abstraction

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line

CS 573 Automata Theory and Formal Languages

Section 1.3 Triangles

8 THREE PHASE A.C. CIRCUITS

Graph Algorithms. Vertex set = { a,b,c,d } Edge set = { {a,c}, {b,c}, {c,d}, {b,d}} Figure 1: An example for a simple graph

Total score: /100 points

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

Area and Perimeter. Area and Perimeter. Solutions. Curriculum Ready.

Implication Graphs and Logic Testing

Connectivity in Graphs. CS311H: Discrete Mathematics. Graph Theory II. Example. Paths. Connectedness. Example

Symmetrical Components 1

Probability The Language of Chance P(A) Mathletics Instant Workbooks. Copyright

GRUPOS NANTEL BERGERON

Section 2.3. Matrix Inverses

Logic, Set Theory and Computability [M. Coppenbarger]

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression

6. Suppose lim = constant> 0. Which of the following does not hold?

Lesson 2.1 Inductive Reasoning

Algorithm Design and Analysis

Applied. Grade 9 Assessment of Mathematics. Multiple-Choice Items. Winter 2005

GM1 Consolidation Worksheet

Lesson 2.1 Inductive Reasoning

A Short Introduction to Self-similar Groups

Maximum size of a minimum watching system and the graphs achieving the bound

Let s divide up the interval [ ab, ] into n subintervals with the same length, so we have

LESSON 11: TRIANGLE FORMULAE

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

Obstructions to chordal circular-arc graphs of small independence number

A CLASS OF GENERAL SUPERTREE METHODS FOR NESTED TAXA

5. Every rational number have either terminating or repeating (recurring) decimal representation.

Activities. 4.1 Pythagoras' Theorem 4.2 Spirals 4.3 Clinometers 4.4 Radar 4.5 Posting Parcels 4.6 Interlocking Pipes 4.7 Sine Rule Notes and Solutions

Transcription:

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE M. STISSING, C. N. S. PEDERSEN, T. MAILUND AND G. S. BRODAL Bioinformtis Reserh Center, n Dept. of Computer Siene, University of Arhus, Denmrk R. FAGERBERG Dept. of Mthemtis n Computer Siene, University of Southern Denmrk, Denmrk We present n lgorithm for lulting the qurtet istne etween two evolutionry trees of oune egree on ommon set of n speies. The previous est lgorithm hs running time O( 2 n 2 ) when onsiering trees, where no noe is of more thn egree. The lgorithm evelope herein hs running time O( 9 n log n)) whih mkes it the first lgorithm for omputing the qurtet istne etween non-inry trees whih hs su-qurti worst se running time. 1. Introution The evolutionry reltionship etween set of speies is onveniently esrie s tree, where the leves represent the speies n the inner noes speition events. Using ifferent iologil t, or ifferent methos of inferring suh trees (see e.g. Felsenstein 1 for n overview) n yiel ifferent inferre trees for the sme set of speies, n to stuy suh ifferenes in systemti mnner, one must e le to quntify suh ifferenes using well-efine n effiient methos. Severl istne mesures hve een propose, 2 6 eh hving ifferent properties n refleting ifferent spets of iology. This pper onerns effiient omputtion of the qurtet istne, 6 istne mesure with severl ttrtive properties. 7,8 For n evolutionry tree, the qurtet topology of four speies is etermine y the miniml topologil sutree ontining the four speies. The four possile qurtet topologies of four speies re shown in Fig. 1. The three leftmost of these we enote utterfly qurtets, the rightmost is str qurtet. Given two evolutionry trees on the sme set of n speies, the qurtet istne etween them is the numer of sets of four speies for whih the qurtet topologies iffer in the two trees. For inry trees, the fstest metho for omputing the qurtet istne etween two trees runs in O(n log n) 9, ut for trees of ritrry egree, the fstest lgorithms run in O(n 3 ) (inepenent of the mximl egree) or O(n 2 2 ) (where is the mximl egree in the tree) 10. This pper fouses on trees where eh inner noe v hs egree t most, where is fixe onstnt. We evelop n O( 9 n log n) time n O( 8 n) spe lgorithm for Current ffilition: Dept. of Sttistis, University of Oxfor, UK 1

2 () () () () (e) v v (f) Figure 1. Figures () () show the four possile qurtet topologies of speies,,, n. Figures (e) n (f) show the two orere utterfly qurtet topologies inue y the utterfly qurtet topology in (). omputing the qurtet istne etween suh two trees, se on the lgorithm in Brol et l. 9 This is the first lgorithm for omputing the qurtet istne etween non-inry trees with su-qurti worst se running time. In Brol et l. 9 the qurtet istne ws lulte s ( n 4) minus the numer of shre qurtets. We will opt this pproh, fousing on lulting shre qurtets, noting tht in our setting trees might inlue str qurtets. We first onsier lulting the numer of shre utterfly qurtets etween two trees, n then exten the lgorithm into lulting shre str qurtets s well. 2. Terminology An evolutionry tree is n unroote tree where ny noe, v, is either lef or n inner noe of egree v, where 3 v. Leves re uniquely lele y the elements of set S of speies, where S = n. For n evolutionry tree T, the qurtet topology of set {,,, } S of four speies is the topologil sutree of T inue y these speies. The possile qurtet topologies for speies,,, re shown in Fig. 1. An evolutionry tree with n leves gives rise to ( n 4) ifferent qurtet topologies. Butterfly qurtet topologies re piring of the four speies into two pirs, efine, see Fig. 1, y letting n e pir if the pth from to oesn t meet the pth from to. We view the (utterfly) qurtet topology of four-set of speies {,,, } s two oriente qurtet topologies 9, given y the two possile orienttions of the mile ege of the topology, see Fig. 1. An oriente qurtet topology is thus n orere pir of two-sets, e.g. ({, }, {, }). The numer of oriente qurtet topologies of tree is twie the numer of unoriente qurtet topologies. In the rest of this pper, until Set. 6 we y qurtet onsier n oriente qurtet topology n use the nottion for ({, }, {, }). Let Q e the set of ll possile qurtets of S. Let Q T Q T 2 T 3 enote the set of qurtets in n evolutionry tree T. We will v ssoite qurtets of Q to inner noes v of T, suh tht is T 1 T 4 ssoite to v if v is the noe where the pths from to n T 6 T 5 to meet (see Fig. 1, right hn sie). In the terminology of Christinsen et l. 10 these re ll the qurtets lime y eges Figure 2. An inner noe v T with inient v we enote the set of ll qurtets ssoite pointing to v. By Q sutrees T 1,..., T 6 to v. Hving the trees inient to v, T 1, T 2,..., T v, see Fig. 2, qurtet is ssoite to v if n only if n re in the sme sutree n n re in two ifferent ) sutrees. The totl numer of qurtets ssoite to v, Q v is then Tj T k where i,j, n k is in the intervl 1... v, n T enotes i j i k i k>j ( Ti 2 the numer of leves in T n enote this the size of T. The min strtegy of fining the shre qurtets etween two trees, T n T, is, for eh v in T, to ount how mny

3 of the qurtets ssoite with v re lso qurtets of T n lulte the sum over ll v, v T Q v Q T. Doing this we will relte qurtets to olouring, using the olours A, B 1, B 2,..., B 2, C, of the elements in S. For n internl noe v in T, we will sy tht S is oloure oring to v if ll leves in eh sutree inient to v is oloure using one olour n no other sutree hs its leves oloure this olour. Hving olouring of S n qurtet, we sy tht the qurtet is omptile with the olouring if n hve ifferent olours n n hve thir olour. These, lmost ientil, efinitions gives us the following lemm, similr to Brol et l. 9, Lemm 1. Lemm 2.1. When S is oloure oring to hoie of v in T, the set of possile qurtets omptile with the olouring is extly the set Q v of qurtets ssoite with v. Consequently, if S is oloure oring to v in T, the qurtets in Q T omptile with this olouring re extly the qurtets ssoite with v tht re lso qurtets of T. The lgorithm will, for eh v in T, ensure olouring oring to v n then ount the numer of qurtets of T omptile with this olouring. In orer to o this olouring, we will mintin pointers etween elements of S n the leves of T n T n vie vers. 3. The Bsi Algorithm O( 9 n log 2 n) In this setion we expn the ie given ove into n lgorithm for lulting the shre qurtets etween T n T with running time O( 9 n log 2 n). The lgorithm olours S oring to noes v (using the proeure olourleves(u, X ), whih olours ll leves in U with the olour X ) n uses hierrhil eomposition tree H T in ounting the numer of qurtets in T omptile with this olouring, shre(v, T ). The hierrhil eomposition tree, esrie in etil in Set. 5, enles hnge of olour of k leves in time O( 9 (k + k log n k )) n hieves O(1) time for lulting shre(v, T ). The hierrhil eomposition tree H T is onstrutle in O( 8 n) time n O( 8 n) spe. A pseuooe version of the lgorithm is given in Alg. 1. The lgorithm ssumes T hs een roote in n ritrry lef. Let v enote the numer of leves in the sutree roote t v, n ll this the size of v. A simple trversl lets us nnotte eh noe v suh tht it knows its lrgest hil, Lrge(v) where in se of tie we ritrrily selet one n whih of its hilren re not the lrgest, Smll i (v). Let Smll i (v) enote the i th smllest sutree, with respet to the numer of leves in this sutree. Prior to the first ll of the lgorithm, the root of T is oloure C n ll (other) leves re oloure A. The lgorithm is initilly lle with the single hil of the root of T. The lgorithm reurses through the entire tree, summing the numer of shre qurtets etween v n T, v T Q v Q v, for eh v, ultimtely lulting Q T Q T. The lgorithm olours the leves oring to v n then ounts the numer of shre qurtets. It then reurses, first on the lrgest hil of v, Lrge(v) n then on the smller hilren of v, Smll i (v). Before reursing on noe v the lgorithm ensures tht ll leves elow v re oloure A. Returning from the reursion, the lgorithm ensures tht ny lef elow v is oloure C. We see tht the lgorithm olours lef only when this lef is in smller sutree, Smll i (v), of some v on whih ount(v) is invoke. As v is t lest twie the size

4 Algorithm 1 ount(v, T ) - ount numer of shre utterfly qurtets etween the sutree roote t v n T Require: v non root noe of T, ll leves elow v is oloure A, ll leves not in v oloure C. Ensure: Res is the no. of qurtets shre etween noes in v n T. All leves in v re oloure C. if v is lef then olourleves(v, C) Res 0 else Res 0 for ll Smll i (v) o olourleves(smll i (v), B i) Res Res + shre(v, T ) for ll Smll i (v) o olourleves(smll i (v), C) Res Res + ount(lrge(v)) for ll Smll i (v) o olourleves(smll i (v), A) Res Res + ount(smll i (v)) return Res of ny Smll i (v), ny lef n t most e oloure O(log n) times. As the hierrhil eomposition tree enles the hnge of olour of k leves in time O( 9 (k + k log n k )) O( 9 k log n), we n hrge this y letting eh olouring of lef e of O( 9 log n) ost. The entire lgorithm, s the olouring is the preominnt time onsuming ftor, is then of time O( 9 n log 2 n). The spe use is ominte y the spe use y the hierrhil eomposition tree, whih is O( 8 n) f. Set. 5. 4. The Improve Algorithm O( 9 n log n) The nlysis of the si lgorithm ove shows tht if ny noe v uses time O( 9 log n i Smll i(v) ) then the entire lgorithm uses time O( 9 n log 2 n). This is often referre to s the smller-hlf trik: Lemm 4.1. (Smller-hlf trik) If ny inner noe v supplies term v = i Smll i(v) n ny lef term v = 0, then the sum over ll noes v v n log n. This is esily prove y inution. As n instne of this, the nlysis ove use = 9 log n. The improve lgorithm elow, uses n extene smller-hlf trik whih is lso esily prove y inution. Lemm 4.2. (Extene smller-hlf trik) In roote tree, if ny inner noe v supplies term v = i ( ) Smll i(v) log n ny lef term v = 0, then the sum over ll noes v v n log n. v Smll i (v) The min oservtion in hieving the improve lgorithm omes from noting tht, whenever the si lgorithm ount(v) is lle, ll leves outsie the sutree roote t v will hve the olour C n these leves will not hnge their olour while ount(v) is eing proesse. This, of ourse, lso pplies to the leves of T oloure C. We will therefore, in ertin ses, onstrut ompt representtion of T, y ontrting noes of T oloure C. We will onsier ny onstrute T s hving n ssoite hierr-

5 Algorithm 2 fstcount(v, T ) ount numer of shre utterfly qurtets etween the sutree roote t v n T Require: v non root noe of T, ll leves in v oloure A, ll leves not in v oloure C. Ensure: Res equls the numer of qurtets shre etween v n T. All leves in v re oloure C. Res 0 if v is lef then olourleves(v, C, T ) else for ll Smll i (v) o olourleves(smll i (v), B i, T ) Res Res + shre(v, T ) for ll Smll i (v) o olourleves(smll i (v), C, T ) for ll Smll i (v) o T i ontrt(smlli (v), extrt(smlli (v), T )) if T > 5 Lrge(v) then T ontrt(lrge(v), T ) Res Res + fstcount(lrge(v), T ) for ll Smll i (v) o olourleves(smll i (v), A, T i ) Res Res + fstcount(smll i (v), T i ) return Res hil eomposition tree H T, see elow. A pseuooe version of the improve lgorithm is given in Alg. 2. If we ensure tht T (n thus H T ) is of size O( v ) whenever fstcount(v, T ) is proesse, we know tht k leves n hve their olour upte in time O( 9 (k + k log v k )). The extene smller-hlf trik then ensures tht the totl time spent olouring is O( 9 n log n). The lgorithm resemles the si lgorithm exept for ontrt(u, Y ) n extrt(u, Y ), the etils of whih re given in Set. 5. For the nlysis of the improve lgorithm it suffies to note tht ontrt(u, Y ) mkes ompt representtion of Y y ontrting nything in Y exept the leves in U. This yiels tree with no more thn 4 U noes in time O( 9 Y ). Likewise extrt(u, Y ) mkes ompt representtion of Y. This representtion lso lets the leves of U in Y remin intt. All other noes re ontrte. The leves of the rising tree re (impliitly) oloure C. The opertion extrt(u, Y ) ompletes in O( 9 U log Y U ) time n yiels tree of size O( U log Y U ). When onstruting suh new tree, s result of ontrt(u, Y ), we will upte the pointers of S to point to the leves of the newly rete tree. This mnipultion of S enles the olouring of leves in the newly rete trees. Regring orretness, ssuming the leves in v re oloure A n the leves outsie v re oloure C when fstcount(v, T ) is lle, the lgorithm will, s the si lgorithm, ensure olouring oring to v prior to the ll shre(v, T ). Furthermore, efore reursing on Smll i (v) (or Lrge(v)) the lgorithm ensures tht the tree use in the reursion is oloure suh tht ll leves in Smll i (v) (Lrge(v)) re oloure A n the leves outsie re oloure C. The orretness of the lgorithm follows from the orretness of the si lgorithm. For time omplexity, we see tht T is of size O( v ) when fstcount(v, T ) is lle. This implies tht the trees T i re eh of size O( Smll i(v) ). The time use in onstruting these is O( 9 i Smll i(v) log ), i.e. onstrution time is ominte v Smll i(v)

6 y the time tken olouring the leves olourleves(smll i (v), B i, T ). We note tht eh H T i is onstrutle in time O( 8 T i ), see Set. 5, i.e. is ominte y the time use otining T i y ontrtion. Contrting the lrger prt of T, ontrt(lrge(v), T ), ompletes in time O( 9 T ) n yiels tree of size t most 4 Lrge(v) (see Lemm 5.5 elow). The totl time spent on repetely ontrting lrger prts of T, s we only o this when 5 Lrge(v) T is thus oune y the sum of the geometri series 4 k 5 times 9. This implies tht the time spent ontrting T is liner in the initil size of T (times 9 ), i.e. the time spent is oune y the time onstruting T, the time use y ontrt(extrt(smll i (v), T )). Ultimtely this implies tht the lgorithm ompletes in O( 9 n log n) time. Regring the spe use y the lgorithm, we see tht the only itionl spe it onsumes is when reting T i s (n orresponing H T i s) t eh noe v T ; in totl no more thn the mximl spe use on ny root-to-lef pth P j in T, i.e. O( 8 mx j v P j i Smll i(v) ). Consier pth P j, there will e numer of noes v, where oth v n Smll i (v) re on the pth. The totl spe onsume y ll suh v is no more thn 8 1 n i 1 2 O( 8 n), tht is we store t most 1 i prts of wht is left, i.e. ll the smller hilren, n s we know Smll i (v) is on the pth, we n ut of t lest hlf. The rest of the pth onsists of pirs v n Lrge(v). For eh suh pir we onsume 8 i Smll i(v) spe, we might think of this s mrking the leves in eh of the Smll i (v). However s no other pir v, Lrge(v) n mrk n lrey mrke lef, we onlue tht these pirs onsume O( 8 n) spe. In totl O( 8 n) spe is use. 5. Hierrhil Deomposition Tree The lgorithms evelope uses the hierrhil eomposition tree T hevily. The t struture n, in H T onstnt time, lulte the numer of qurtets in n evolutionry h tree T omptile with the urrent olouring of S. The t struture g f e llows hnge of the olour of k elements of S in time O( 9 (k + h g f e k log n Figure 3. A tree T n hierrhil eomposition of this tree. k )) where n is the numer of H T is the hierrhil eomposition tree orresponing to the leves in T. In the following we shown hierrhil eomposition of T. esrie how to uil n upte suh tree inspire y the pproh in Brol et l. 9 The hierrhil eomposition of T is se on the notion of omponents. A omponent C of T is onnete suset of noes in T. An externl ege of C is onneting noe in C with noe outsie of C. The numer of externl eges is the egree of C. We will llow two types of omponents: (1) Simple omponents ontining single noe of T, see Fig. 4(), 4(). (2) Composite omponents omposing two other omponents, where oth of these re of egree two 4() or t lest one of these re of egree one, see Fig. 4(), 4(e).

7 () () () () (e) Figure 4. Possile omponents: (), () Simple omponents, lef n n inner noe omponent respetively. () (e) Composite omponents: () Composing two omponents of egree two. () Composing omponent of egree C with omponent of egree one. (e) Composing two egree one omponents s seen, speil se of (). Letting eh noe of T e omponent y itself, hierrhil eomposition of T is set of omponents rete y repetely omposing these. Note tht the egree of omposite omponent will e no more thn the mximum egree of the omponents it is ompose of. In eomposing T, note tht, the urrent set of omponents form tree, hene there will lwys e t lest one omponent of egree 1, n we n therefore lwys ontinue omposing until we re left with omponent ontining ll simple omponents of T. Hving hierrhil eomposition of T inluing omponent ontining ll simple omponents of T, we might in nturl wy view this s tree. A hierrhil eomposition tree H T for T is roote inry tree with leves orresponing to simple omponents of T n inner noes orresponing to omposite omponents (omponents in hierrhil eomposition of T ), see Fig. 3. An inner noe v, with hilren v n v, orrespons to the omponent C rising when the two omponents C n C, orresponing to the hilren of the noe, re ompose. The root, r, orrespons to omponent ontining ll simple omponents of T. In this sense mny hierrhil eomposition trees exist. We will show how to onstrut lolly-lne hierrhil eomposition tree. A roote inry tree with n noes is -lolly-lne if for ll noes v in the tree, the height of the sutree roote t v is t most (1 + log v ), where v is the numer of leves in the sutree n the height is the mximl numer of eges on ny root-to-lef pth. The following lemm is n extension of Brol et l. 9, Lemm 3. Lemm 5.1. For ny unroote tree T with n noes of egree t most, 6-lolly lne hierrhil eomposition tree H T n e ompute in time O(n). The following lemm from Brol et l. 9 ouns the numer of noes on k root-to-lef pths in hierrhil eomposition tree. Lemm 5.2. The union of k root-to-lef pths in -lolly lne roote inry tree with n leves ontins t most k(3 + 4) + 2k log n k noes. Hving n evolutionry tree T with n leves n the ssoite hierrhil eomposition tree H T we wnt to ount the numer of qurtets in T omptile with the urrent olouring of S in onstnt time. Further, when k elements of S hnge their olour, we shoul hnle this upte in time O( 9 (k + k log n k )). We will ssoite funtions n vetors to the noes of H T. At ny noe v, hving the ssoite omponent C, in H T the vetor = ( 1, 2,..., ) store hols the numer of leves ontine in C of olours A, B 1, B 2,..., B 2 n C respetively. If C is of egree

8 C, the funtion F store t v, is funtion of C vetor vriles. The funtion ounts the numer of qurtets ssoite to ny noe in C omptile with the urrent olouring of S. This implies tht the funtion store t the root of H T ounts the totl numer of qurtets in T omptile with the urrent olouring of S. Furthermore, sine the omponent ssoite to the root of H T hs 0 externl eges, the funtion store here is onstnt. The elements i i of the vetor vriles i of F orrespon to the numer of leves oloure with the i th olour in the omponent inient to the i th externl ege of C. First we esrie how to ssoite the vetors n funtions to the leves of H T, tht is the simple omponents of T. If v hs n ssoite omponent of egree 1, i.e. it represents lef l in T, hving the olour A, B 1, B 2,..., B 2 or C the vetor store t v is (1, 0,..., 0, 0),(0, 1,..., 0, 0),...,(0, 0,..., 1, 0) or (0, 0,..., 0, 1) respetively. Sine the numer of qurtets ssoite to l is 0, the funtion store t v is ientilly zero: F ( 1 ) = 0. Otherwise, if v, with ssoite omponent C of egree C, represents n internl noe u in T the tuple store here is (0, 0,..., 0, 0) s the omponent ontins no leves of ny olour. The funtion F store here, ounts the numer of qurtets ssoite to u omptile with the olouring of S. Rell tht qurtet ssoite to u hs n in the sme sutree inient to u n n in two ifferent sutrees. Further, if is omptile with the olouring of S, n hve the sme olour n n hve two ifferent olours. F is then: F ( 1, 2,..., C ) = C C C i j i k i k>j i j i k i k j ( ) i i j j 2 k k We now turn to the tuples n funtions ssoite to the inner noes of H T. The inner noe v, with hilren v n v, will store the vetor +, ssuming v n v store the vetor n respetively. Letting F n F e the funtions store t v n v, we express F store t v. Let C e the omponent orresponing to v, likewise for C n C. If oth C n C re egree 2 omponents (Fig. 4()) we onstrut F s F ( 1, 2 ) = F ( 1, 2 + ) + F ( 1 +, 2 ), ssuming the seon externl ege of C is the first externl ege of C n the first externl ege of C is the first externl ege of C n the seon externl ege of C is the seon externl ege of C (other ege numerings re hnle similrly). If C is omponent of egree C 2 n C omponent of egree 1 (Fig. 4()), we onstrut F, this time ssuming the C th externl ege of C is the first (n only) externl ege of C, the C externl eges of C orrespon to the C first externl eges of C : F ( 1, 2,..., C ) = F ( 1, 2,..., C, ) + F ( 1 + 2 +...+ C + ). As speil se of the ove, if oth C n C re of egree 1 (Fig. 4()), we note tht F is onstnt: F = F ( ) + F ( ). If C is simple omponent, F is polynomil of egree t most 4 with no more thn C 2 vriles. By inution in the wy F s re onstrute, this is then seen to hol for ny omponent. At ny noe v we oserve tht the F (n ) to e store is onstrutle in O( 8 ) time. This implies the following lemm, similr to Brol et l. 9, Lemm 5: Lemm 5.3. The tree H T n e eorte with the informtion esrie ove in time

9 n spe O( 8 n). The following lemm, similr to Brol et l. 9, Lemm 6, rises s onsequene of Lem. 5.2 n the ft tht the eortion store t noe v in H T is onstrutle in O( 8 ) time, given the eortion t its hilren. Lemm 5.4. The eortion of H T n e upte in O( 9 (k + k log n k )) time when the olour of k elements in S hnges. The ove results imply the running time of the si lgorithm. We now turn to the etils of ontrt n extrt use in the improve lgorithm. The proeure ontrt(u, Y ) yiels tree Y of size O( U ) in time O( 9 Y ) letting the leves present in U remin untouhe in Y. This is omplishe y opying Y n ontrting eges orresponing to legl ompositions. This wy Y ontins noes orresponing to simple or omposite omponents. The funtions n vetors store t these omponents is inherite y the noes they orrespon to. Y s eges re suset of the eges of Y, nmely the eges not ontrte. The following lemm, n extension of Brol et l. 9, Lemm 4), ensures tht Y hs no more thn 4 U noes, n tht eh of the leves in U is lef in Y. Lemm 5.5. Let T e n unroote tree with n noes of egree t most, n let k 0 leves e mrke s non-ontrtile. In O(n) time eomposition of T into t most 4k omponents n e ompute suh tht eh mrke lef is omponent y itself. Creting the informtion to e store t the noes of Y uses O( 8 ) time per ontrtion me, tht is ontrt(u, Y ) ompletes in time O( 9 Y ). We n lulte H Y, in the time stte, s eh noe of Y hs n ssoite vetor n funtion. Likewise extrt(u, Y ) yiels ontrte tree Y of size O( U log Y U ) in time O( 9 U log Y U ). This is hieve y using the hierrhil eomposition tree H Y. We mrk the internl noes of H Y on the U root-to-lef pths leing to the leves in U. Doing this ottom-up, one lef t time, we n stop mrking when n lrey mrke noe is enountere. Lem. 5.2 then ouns the numer of mrke noes. Removing ll these mrke noes yiels set of sutrees of H Y. The root noes of these sutrees orrespon to omponents in Y. We let these root noes e the noes of Y. Hving the externl eges of eh of the omponents orresponing to the noes of Y we onnet suh two noes if they shre n externl ege. This n e one in time liner in the numer of eges, ssuming tht the eges re lelle. The leves in U re lso leves in Y. In orer to onsier ll leves in Y oloure C in Y we let the noes of H Y store nother vetor C n funtion F C. These re efine equivlently to n F with the exeption tht they ssume tht ll leves in the ssoite omponent re oloure C. These n e onstrute one n for ll when H Y is onstrute. We let C n F C e the informtion store t the noes of Y. We note tht we use O( 8 ) time, opying informtion, per noe in the extrtion.

10 REFERENCES 6. Clulting Shre Str Qurtets The lst step is the lultion of shre str qurtets etween T n T. We opt the notion of ssoite n omptile from utterfly qurtets. We n, in the sme wy s ove, onstrut polynomils G ounting the numer of str qurtets ssoite with simple omponents of T n omptile with the urrent olouring of S. As there re no str qurtets ssoite to lef of T, G( 1 ) = 0. At internl noes of T : G( 1, 2,..., C ) = C C C C i j>i k>j l>k i j i k i l i k j l j l k i i j j k k l l The onstrution of G s t internl noes of the hierrhil eomposition tree orrespons to the onstrution of F s t these noes. We note tht G is itself polynomil of egree 4 with no more thn 2 vriles, i.e. it n e store n mnipulte in O( 8 ) spe n time. We onlue tht we n exten oth the si n the improve lgorithm, y ssoiting G s to the noes of the trees, into ounting shre str qurtets s well s the shre utterfly qurtets. This enles the lultion of the qurtet istne etween T n T. Referenes 1. J. Felsenstein. Inferring Phylogenies. Sinuer Assoites In., 2004. 2. D. F. Roinson n L. R. Fouls. Comprison of weighte lelle trees. In Comintoril mthemtis, VI (Pro. 6th Austrl. Conf), Leture Notes in Mthemtis, pges 119 126. Springer, 1979. 3. M. S. Wtermn n T. F. Smith. On the similrity of enrogrms. Journl of Theoretil Biology, 73:789 800, 1978. 4. B. L. Allen n M. Steel. Sutree trnsfer opertions n their inue metris on evolutionry trees. Annls of Comintoris, 5:1 13, 2001. 5. D. F. Roinson n L. R. Fouls. Comprison of phylogeneti trees. Mthemtil Biosienes, 53:131 147, 1981. 6. G. Estrook, F. MMorris, n C. Mehm. Comprison of unirete phylogeneti trees se on sutrees of four evolutionry units. Syst. Zool., 34:193 200, 1985. 7. D. Brynt, J. Tsng, P. E. Kerney, n M. Li. Computing the qurtet istne etween evolutionry trees. In Proeeings of the 11th Annul Symposium on Disrete Algorithms (SODA), pges 285 286, 2000. 8. M. Steel n D. Penny. Distriution of tree omprison metris some new results. Syst. Biol., 42(2):126 141, 1993. 9. G. S. Brol, R. Fgererg, n C. N. S. Peersen. Computing the qurtet istne etween evolutionry trees in time O(n log n). Algorithmi, 38:377 395, 2003. 10. C. Christinsen, T. Milun, C. N. S. Peersen, n M. Rners. Computing the qurtet istne etween trees of ritrry egree. In R. Csio n G. Myers, eitors, WABI, volume 3692 of LNCS, pges 77 88. Springer, 2005. ISBN 3-540-29008-7.