Parsimonious cluster systems

Size: px
Start display at page:

Download "Parsimonious cluster systems"

Transcription

1 Parsimonious cluster systems François Brucker (1) and Alain Gély (2) (1) Laboratoire LIF, UMR 6166, Centre de Mathématiques et Informatique 39 rue Joliot-Curie - F Marseille Cedex13. (2) LITA - Paul Verlaine University, Ile du Saulcy BP Metz cedex, France alain.gely@univ-metz.fr Abstract. We introduce in this paper a new clustering structure, parsimonious cluster systems, which generalizes phylogenetic trees. We characterize it as the set of hypertrees stable under restriction and prove that this set is in bijection with a known dissimilarity model: chordal quasi-ultrametrics. We then present one possible way to graphically represent elements of this model. Keywords: overlapping clustering, parsimony, phylogenetic trees, dissimilarities. 1 Introduction In systematic biology, to infer the evolutionary history between DNA sequences or animal species one usually assumes that this history is parsimonious. In some sense, the parsimony principle is the biological formulation of the maximum likelihood approach in statistics or the energy minimization principle in physics. In a word, the simpler the better. Thus: Every species has only one immediate ancestor. It is simpler to conserve a species than to recreate it. The above two remarks lead to the fact that the graph which links a species to its direct ancestor is a tree (the phylogenetic or evolution tree), the eldest ancestor, when it exists, being named the root (see for instance Semple and Steel, 2003).

2 The parsimony principle associated with phylogenetic trees is a good model for a lot of situations in fields other than just biology, for instance in psychology (Tversky, 1977), in text-mining (Lepouliquen, 2008) or more generally in all fields where the studied objects (e.g. DNA) or parts of them (e.g. genes) are in some way transmitted, combined, or modified during time. Definition 1. A dissimilarity d on a set X is a symmetric function from X X to the set of non-negative real number for which d(x, y) = 0 if x = y. A proper dissimilarity d on X is a dissimilarity for which d(x, y) = 0 if and only if x = y. Since we will only speak about proper dissimilarities, we will consider that a dissimilarity is always proper. A distance on X is a dissimilarity such that for all x, y, z X d(x, z) d(x, y) + d(y, z) (triangular inequality). In this paper the finite set X will denote the set of objects studied, usually the species existing today in phylogenetics. In a phylogenetic tree the vertices are then the elements of X and their common ancestors, denoted as latent vertices (see for instance Felsenstein, 1983). When the edges of such a tree are numerically valued by a difference between the two corresponding linked vertices (the smaller the value the closer the vertices), the distance between two arbitrary vertices computed by summing the values of the edges of the path joining them is a tree distance. A valued phylogenetic tree is usually named X-tree. A well known result (Buneman, 1971, among others) shows that tree distances and X-trees are in bijection. To each distance d on X satisfying the property of Theorem 1 below can be associated a unique X-tree for which the sum of the values of the edges in the path between two elements of X is equal to d. Here the latent vertices of the tree (ancestors in phylogeny) can here be considered to have been introduced in order to preserve the distance property (evolution history). Theorem 1 (Buneman, 1971 (among others)). A distance d on X is a tree distance if and only if for all x, y, z, t X: d(x, y) + d(z, t) max{d(x, z) + d(y, t), d(x, t) + d(y, z)} (1) 2

3 One of the benefits of dealing with objects described by a dissimilarity is that one can find groups of similar elements by considering dissimilarity clusters. Definition 2. Let d be a dissimilarity on X. The diameter for d of a set A X, written diam d (A), is the largest dissimilarity between two elements of this set. Definition 3. A dissimilarity cluster for a dissimilarity d on X, is a subset A of X with diam d (A) < min{diam d (A {x}) x X\A}. Dissimilarity clusters are a generalization of maximal cliques in a graph. Definition 4. A threshold graph at level α for a dissimilarity d on X is the graph G α = (X, E α ) where xy E α if and only if d(x, y) α. Indeed, if d is a dissimilarity on X, the dissimilarity clusters of d are exactly the maximal cliques of all the threshold graphs of d. A given dissimilarity cluster A of some dissimilarity d on X has then two properties: external isolation, because it is a maximal clique of the graph G diamd (A), an internal cohesion measure, because the dissimilarity between any pair of elements in A is less than diam d (A). The smaller the diameter the greater the internal cohesion. Definition 5. A distance d on X is an ultrametric if for all x, y, z X: d(x, z) max{d(x, y), d(y, z)} (2) Definition 6. A strong hierarchy E on X is a subset of 2 X such that for all A, B E: A B {φ, A, B}. When a tree distance d on X is also an ultrametric (ultrametric are distances whose clusters form an strong hierarchy), the latent vertices of its associated X-tree are exactly the dissimilarity clusters of d. Due to the so called long-branch issue, this is no more the 3

4 case in general. For instance, the tree distance on X = {x, y, z, t} associated with the valued tree of Fig. 1 (a) is an ultrametric and its dissimilarity clusters are {{x}, {y}, {z}, {t}, {x, y}, {z, t}, {x, y, z, t}}. This is no more true for the tree distance associated with the tree of Fig. 1 (b), even if the tree shape is the same, its dissimilarity clusters are {{x}, {y}, {z}, {t}, {z, t}, {x, z, t}, {y, z, t}, {x, y, z, t}}: the latent vertex u does not correspond to the cluster {x, y} anymore. The long branches ux and uy have drastically changed the dissimilarity clusters. x 1 1 z x 3 1 z u 1 1 y (a) t y (b) t Figure 1. Long branches issue. We propose to directly use the concept of parsimony for defining a clustering structure adapted to a phylogeny instead of inferring clusters from latent nodes. This paper is organized as follows. First, we formally introduce a new clustering model called parsimonious cluster system. Then we characterize it, and show that there are bijections between parsimonious cluster systems, a known dissimilarity structure (chordal quasi-ultrametrics) and a known lattice structure (dismantlable lattices). Finally, we use a known algorithm which computes a chordal quasi-ultrametric with a short example, and propose a possible graphical representation of a given parsimonious cluster system. 2 Parsimonious cluster system In classification, clusters comprise objects of a set X which have something in common. Before precisely defining the notion, we will use it to motivate the concept of parsimonious cluster systems. 4

5 Suppose that a species S 1 with characteristics (u, v, w) (among others) evolves in a first species S 2 with characteristics (U, v, w) and a second one S 3 keeping the three characteristics of its ancestor. Then, species S 2 evolves in species S 4 and S 5 having (U, V, w) and (U, v, w) as characteristics, respectively. Thus, considering only the living species: Animals of species S 4 and S 5 share the characteristics U and w, Animals of species S 3 and S 5 share the characteristics v and w, The characteristic w shared by S 3 and S 4 also belongs to S 5. Species S 3 and S 4 represent a border of this phylogeny because S 5 shares only some property of them: properties shared by two elements of the border of a phylogeny are also shared by elements which are between them. This can be stated as a cluster property when grouping elements which share one or more properties. Let E 2 X be a set of clusters. For all x 1, x 2, x 3 X: If there exists a cluster A 1 E, for which x 1, x 2 A 1 and x 3 / A 1 (animals sharing property U and w), and if there exists a cluster A 2 E for which x 2, x 3 A 2 and x 1 / A 2 (animals sharing properties v and w), then for any cluster A 3 E such that x 1, x 3 C (animals in C share at least one property with animals x 1 and x 3 ), we also have x 2 C (animals sharing property w). To be a parsimonious cluster system, we postulate that this must be also true for an arbitrary number of clusters: if there exist p > 1 clusters A i X (1 i p) and p + 1 objects x i X (1 i p + 1) for which, for all 1 i p, we have x i, x i+1 A i and for all 1 < i < p, we have x j / A i for j = i 1, i + 2, x 3 / A 1 and if, moreover, x p 1 / A p then any cluster A p+1 containing x 1 and x p+1 also contains x j for 1 j p + 1. That is, if there exist p + 1 animals x 1,..., x p+1 such that for any given i (1 < i < p+1), x i shares only parts of properties from x i 1 and x i+1 (there exist two property clusters, one containing x i 1 and x i but not x i+1 and the other containing x i and x i+1 but 5

6 not x i 1 ) then all the properties both shared by x 1 and x p+1 are also shared by all the x j (1 j p + 1). The species x i 1 and x i+1 can be seen as a local border (the condition with 3 clusters) which is extended to a global border containing x 1 and x p+1 : it is simpler to conserve the properties shared by a border for animals between them than to create new ones. A parsimonious cluster system is then a cluster system satisfying the above condition. There are several ways of defining a cluster system. We chose the following one: Definition 7. A cluster system H on X is couple (X, E) such that E is a subset of 2 X for which: φ / E, {x} E for all x X, X E, A, B E and A B φ implies A B E. Elements of E are said to be clusters of H. In its full generality the last condition, the closure, is not really needed for defining a cluster system but since we will only speak about subsets of 2 X closed under finite intersection, we decided to add it in the definition. A formal definition of parsimonious cluster system is then: Definition 8. A parsimonious cluster system H = (X, E) is a cluster system such that for all p > 1 if there exist p clusters A i X (1 i p) and p + 1 objects x i X (1 i p + 1) for which for all 1 i p: x i, x i+1 A i and for all 1 < i < p: x j / A i for j = i 1, i + 2, x 3 / A 1 and x p 1 / A p then any cluster A p+1 containing x 1 and x p+1 also contains x j for 1 j p + 1. For p = 2, the above condition is equivalent to the fact that for any three clusters A, B and C in H, we have A B C {A B, B C, A C}. This condition is known in overlapping clustering as the weak-hierarchy condition (Bandelt and Dress, 1989). Parsimonious cluster systems are then a special case of weak-hierarchical cluster systems. This allows us to use the formalism of binary clustering (Barthélemy and Brucker, 2008) in 6

7 order to define precisely the notions of external isolation and internal cohesion for clusters of a parsimonious cluster system. It is clear that strong hierarchies and interval hypergraphs (cluster systems on X for which each cluster is an interval of a given linear order on X) are parsimonious cluster systems. For instance, Fig. 2 shows the Hasse diagram of an interval hypergraph for the linear order x < y < z < t < u < v x y z t u v Figure 2. Interval hypergraphs are parsimonious cluster systems. Definition 9. A binary dissimilarity δ on X is symmetric function from X X to 2 X such that: x, y δ(x, y) for all x, y X, z, t δ(x, y) implies δ(z, t) δ(x, y) for all x, y X, δ(x, x) = {x} for all x X, there exist x, y X such that δ(x, y) = X. Binary dissimilarities are a general tool for manipulating cluster systems. In our case, we will use the bijection between weakhierarchical cluster system H = (X, E) and a part of binary dissimilarity on X: Theorem 2 (Barthélemy and Brucker, 2008). A weak-hierarchical binary dissimilarity on X is a binary dissimilarity δ on X 7

8 such that for all x 1, x 2, x 3 X: if x 3 / δ(x 1, x 2 ) and x 1 / δ(x 2, x 3 ) then x 2 δ(x 1, x 3 ). There is a one-to-one correspondence between the set weakhierarchy cluster systems on X and the set of all weak-hierarchical binary dissimilarity on X Let H = (X, E) be a weak-hierarchical cluster system. The mapping of Theorem 2 associates to H a weak-hierarchical binary dissimilarity δ H such that, for all x, y X: δ H (x, y) = min{a x, y A; A E} For instance consider the dendrogram of Fig. 3 representing a strong hierarchy (which is also a weak-hierarchical cluster system). Considered as a binary dissimilarity, we have: the cluster 1: {x, y} = δ(x, y), the cluster 2: {x, y, z} = δ(x, z) = δ(y, z), the cluster 3: {u, v} = δ(u, v), the cluster 4: {x, y, z, t, u, v} = δ(x, t) = δ(x, u) = δ(x, v) = δ(y, t) = δ(y, u) = δ(y, v) = δ(z, t) = δ(z, u) = δ(z, v) = δ(t, u) = δ(t, v) x y z t u v Figure 3. A strong hierarchy. A parsimonious binary dissimilarity, can then be defined as follows: 8

9 Definition 10. A parsimonious binary dissimilarity on X is a binary dissimilarity δ on X for which for all sequence x 1, x 2,..., x p X, x i 1 / δ(x i, x i+1 ) and x i+1 / δ(x i 1, x i ) for any 1 < i < p imply that x i δ(x 1, x p ) for all 1 i p. It is clear from the definition of a binary dissimilarity and by Theorem 2 that parsimonious cluster systems on X are in bijection with parsimonious binary dissimilarities. This bijection can provide a preciese specification of the notions of parsimony, external isolation and internal cohesion for clusters. The term parsimonious in parsimonious cluster systems is motivated by the fact that if we already have δ(x 1, x 2 ), δ(x 2, x 3 ),...,δ(x p 1, x p ) it is simpler to assume that δ(x 1, x p ) contains the path δ(x i, x i+1 ) (for 1 i p) from x 1 to x p than to create a new cluster. Moreover, let H = (X, E) be a parsimonious cluster system on X. Each particular cluster A E is generated by two elements x and y (δ H (x, y) = A) just like an ancestor is defined as a predecessor of two new species. At last: for any z, t A, δ H (z, t) A (internal cohesion) and for any z / A, either A δ H (x, z) or A δ H (y, z) (external isolation). The next sections will fully characterize parsimonious cluster systems as a particular cluster system, as a particular dissimilarity, and as a particular lattice, respectively. 2.1 Cluster related characterization We will prove in this section that parsimonious cluster systems are exactly the hypertrees stable under restriction. Definition 11. A hypertree T = (X, E) is a cluster system for which there exists an underlying vertex tree T = (X, E) with edge set E where all clusters of T are connected parts of T. Theorem 3 (Duchet, 1978; Flament, 1978). A cluster system H = (X, E) is a hypertree if: whenever A 1,..., A p E and A i A j φ for all 1 i < j p then A 1 A p φ (Helly property), 9

10 the graph G = (E, E), where {A, B} E if and only if A B φ, is chordal. Note that a graph G = (X, E) is said to be chordal if for any p > 2 and x 1,..., x p X it holds: x i x i+1 E for 1 i < p and x 1 x p E imply that there exists (i, j ) with 1 i < j + 1 p such that x i x j E. The first condition of the above theorem is always satisfied by weak-hierarchical cluster systems, thus by parsimonious cluster systems. Moreover, since parsimonious cluster systems are in bijection with parsimonious binary dissimilarities it is clear using the second axiom of Def. 9 and the definition of a parsimonious binary dissimilarity that the second condition is also satisfied. Thus, a parsimonious cluster system is a hypertree. Let H = (X, E) be a cluster system. The restriction of H to a subset Y X is defined as H Y = (Y, E Y ) with E Y = {A Y A E; A Y φ}. For all Y X, it is clear that H Y is also a cluster system. Theorem 4 shows that the notion of parsimonious cluster system is stable under restriction. Theorem 4. Let T = (X, E) be a parsimonious cluster system. Then, for all Y X, the cluster system H Y is also a parsimonious cluster system. Proof. Suppose that there exists Y X such that H Y = (Y, E Y ) is not a parsimonious cluster system. There exists then p > 1 for which there exist p clusters A i Y (1 i p) of H Y and p + 1 objects x i Y (1 i p + 1) for which for all 1 i p: x i, x i+1 A i and for all 1 < i < p: x j / A i for j = i 1, i + 2, x 3 / A 1 and x p 1 / A p and a cluster A p+1 containing x 1 and x p+1 but not x j for one 1 j p + 1. Since x 1,..., x p+1 Y, if A i / E (with 1 < i < p) there exists B i E where B i Y = A i such that x i, x i+1 B i and x j / B i for j = i 1, i + 2. Since the argument holds for A 1, A p and A p+1, one can form p + 1 clusters of H which violate the parsimonious cluster system definition: this is a contradiction. 10

11 Since parsimonious cluster systems are hypertrees, Theorem 4 shows that parsimonious cluster systems are part of the set of hypertrees that are stable under restriction. The converse is also true: Theorem 5. Let H = (X, E) be a hypertree such that for all Y X, the cluster system H Y = (Y, E Y ) is a hypertree. Then the cluster system H is a parsimonious cluster system. Proof. Suppose that H = (X, E) is not a parsimonious cluster system. There exist then p > 1 and p clusters A i X (1 i p) of H and p + 1 objects x i X (1 i p + 1) for which for all 1 i p: x i, x i+1 A i and for all 1 < i < p: x j / A i for j = i 1, i + 2, x 3 / A 1 and x p 1 / A p and a cluster A p+1 containing x 1 and x p+1 but not x j for one 1 < j < p + 1. Let then i 0 be the largest integer less than j such that x i0 A p+1 and i 1 the smallest integer larger than j such that x i1 A p+1. The restriction of H to {x i0,... x i1 } will then contradict either the first condition of Theorem 3 if i 0 = j 1 and i 1 = j + 1 or the second condition for the other cases. This leads to a contradiction. Theorems 4 and 5 lead to Theorem 6 which fully characterizes parsimonious cluster systems: Theorem 6. For a cluster system H = (X, E) the two following propositions are equivalent: H is a parsimonious cluster system, H Y is a hypertree for all Y X. Fig. 4 shows that not all the hypertrees are stable under restriction. The cluster system of Fig. 4 (a) is not a parsimonious cluster system (but a hypertree and a weak-hierarchical cluster system) because the restriction of this hypertree to {x 1,..., x 4 } induces the clusters {x 1, x 2 }, {x 2, x 3 }, {x 3, x 4 } and {x 4, x 1 }. Deleting one of the clusters of this hypertree (Fig. 4 (b)), the result is a parsimonious cluster system (no restriction can make appear any cycle anymore). This underlines the idea to use parsimonious cluster systems as an evolution model: the history does not change when considering the entire animal kingdom or a part of it. 11

12 x 2 x 2 x 1 x 3 x 1 x 3 x 4 (a) x 4 (b) Figure 4. (a) is not a parsimonious cluster system; (b) is a parsimonious cluster system with five elements, x 1, x 2, x 3, x 4 and the central element. 2.2 Metric related characterization Just as valued phylogenetic trees are called X-trees, we denote by X-hypertrees the valued parsimonious cluster systems. We will speak about X-hypertrees even if the base set of the associated parsimonious cluster system is not X. More formally: Definition 12. Let H = (X, E) be a parsimonious cluster system and f a function from E to the set of non-negative real numbers such that f(a) = 0 A = 1 and A B f(a) < f(b). The couple (H, f), or only H when f is obvious, is called a X-hypertree. We know (Diatta and Fichet, 1998) that for any weak-hierarchical cluster system H = (X, E) there exists a quasi-ultrametric on X whose dissimilarity clusters are exactly the subsets of X contained in E. Definition 13 (Diatta and Fichet, 1998). A dissimilarity d on X is a quasi-ultrametric if for all x, y, z, t X such that max{d(x, z), d(y, z)} d(x, y): d(z, t) max{d(x, y), d(x, t), d(y, t)} (3) This condition is called the four point inequality. Moreover, the dissimilarity clusters for a quasi-ultrametric q on X are exactly the elements B(x, y) = {z max{d(x, z), d(y, z)} d(x, y)} for x, y X. It is then quite easy to compute them all. 12

13 Since parsimonious cluster systems are weak-hierarchical cluster systems, one can associate a unique quasi-ultrametric d to any X-hypertree (H, f) such that f(a) = diam d (A) for any cluster A of H. Let then d be a quasi-ultrametric whose associated cluster system H is a parsimonious cluster system. It is clear that every threshold graph of d (definition 4) is chordal because a non-chordal cycle x 1... x p x 1 would lead to x i 1 / δ H (x i, x i+1 ), x i+1 / δ H (x i 1, x i ) and x i / δ H (x 1, x p ) for all 1 i p which contradicts the definition of an X-hypertree. Finally, a dissimilarity whose associated cluster system is a parsimonious one is a chordal quasi-ultrametric (Brucker, 2001). Definition 14. A dissimilarity d on X is a chordal quasi-ultrametric on X if: d is a quasi-ultrametric, every threshold graph of d is chordal. The converse is also true (Theorem 9) because the two following theorems 7 and 8 lead to the fact that the set of dissimilarity clusters of a chordal quasi-ultrametric is a hypertree stable under restriction, thus is a parsimonious cluster system. Theorem 7 (Brucker, 2001). The set of clusters of a chordal quasi-ultrametric is a hypertree. Theorem 8 (Brucker and Barthélemy, 2007). Let d be a quasi-ultrametric on X and Y X. The clusters of the restriction of d to Y are exactly the restriction to Y of the clusters of d. Theorem 9. For any X-hypertree (H, f) there exists an unique chordal quasi-ultrametric d such that its dissimilarity clusters coincide with the clusters of H and diam d (A) = f(a) for any cluster A of H. Conversely, for any chordal quasi-ultrametric d there exists an unique X-hypertree (H, f) such that its clusters coincide with the clusters of d and diam d (A) = f(a) for any dissimilarity cluster A of d. 13

14 One can finally use the results of Brucker (2001) who proved that tree distances are chordal quasi-ultrametrics, thus their dissimilarity clusters form a parsimonious cluster system (the converse is nevertheless wrong). Once again X-hypertrees meet X- trees. 2.3 Structure related characterization Let H = (X, E) be a cluster system. The couple (E {φ}, ) is then a lattice. Definition 15. A couple T = (E, ) is a lattice if is an order relation on set E and for any x, y E: there exists an unique element x y which is the largest element less than x and y, there exists an unique element x y which is the smallest element larger than x and y. We will here only speak about finite lattices (the set E of lattice T = (E, ) is finite), so lattice and finite lattice have to be considered as synonyms in this paper. Then there exist for each lattice T = (E, ) a smallest and a largest element in T which are denoted by 0 T and 1 T, respectively. For a lattice T = (E, ) with join and meet operators and, x E is said to be: join-irreducible if x = y z implies x = y or x = z, meet-irreducible if x = y z implies x = y or x = z, doubly irreducible if x is both join and meet irreducible. The lattices T = (E, ) where (X, E) is a cluster system are clearly in bijection with those lattices whose the join-irreducible elements are just its atoms in the following sense: Definition 16. x is a atom for a lattice T = (E, ) if y < x implies y = 0 T. Since we will here only speak about lattices whose join-irreducible elements are its atoms, we will assume that lattice and 14

15 finite lattice whose join-irreducible elements are its atoms are synonyms. One can then associate to any lattice T = (E, ) a cluster system H(T ) = (X, E) where X is equal to the atoms of T and E = {A(y) y E\{0 T }} where A(y) is equal to all the elements of X less than y in T. It is moreover clear that there is a one-to-one correspondence between (E {φ}, ) and (E, ). We will prove in this section that parsimonious cluster systems are in bijection with a well known lattice structure: dismantlable lattices. Definition 17 (Rival, 1974). A lattice T = (E, ) is dismantlable if there exists a doubly irreducible element x in E and the lattice T = (E\{x}, ) remains dismantlable. In order to prove the bijection, we will use the characterization of Theorem 10. Definition 18. Let T = (E, ) be a lattice. A crown is a partially ordered set {x 1, y 1, x 2, y 2,..., x n, y n } in which x i < y i, y i > x i+1 for 1 i n 1, x n < y n and x 1 < y n are the only comparability relations. Theorem 10 (Kelly and Rival, 1974). Dismantlable lattices are exactly lattices with no crown. Theorem 11. If T = (E, ) is a dismantlable lattice then H(T ) is a parsimonious cluster system and conversely if H = (X, E) is a parsimonious cluster system then (E {φ}, ) is a dismantlable lattice. Proof. Let T = (E, ) be a dismantlable lattice and suppose that H(T ) = (X, E) is not a parsimonious cluster system. As for proof of Theorem 5 there exist then p > 1 and p clusters A i X (1 i p) of H and p + 1 objects x i X (1 i p + 1) for which for all 1 i p: x i, x i+1 A i and for all 1 < i < p: x j / A i for j = i 1, i + 2, x 3 / A 1 and x p 1 / A p and a cluster A p+1 containing x 1 and x p+1 but not x j for one 1 j p + 1. Let then i 0 be the largest integer less than j such that x i0 A p+1 and i 1 the smallest integer larger than j such that x i1 A p+1. 15

16 One can then extract a crown of (E {φ}, ) from the ordered set {{x i0 }, δ H (x i0, x i0 +1), {x i0 +1}, δ H (x i0 +1, x i0 +2),..., {x i1 }, δ H (x i1, x i0 )}, which is a contradiction. Conversely, let H = (X, E) be a parsimonious cluster system and suppose that the lattice (E {φ}, ) admits a crown {A 1, B 1, A 2, B 2,..., A n, B n }. One can assume that n > 3 since otherwise H wouldn t be a weak-hierarchical cluster system. Let Y = X\(B n B 2 ). Since n > 3, {A 1 Y, B 1 Y, A 2 Y, B 2 Y,..., A n Y, B n Y } is a crown of (E Y {φ}, ) and B n Y B 2 Y = φ. The graph G = (E Y, E) where {A, B} E if and only if A B φ will admit a cycle (B 1 Y, B 2 Y,..., B n Y, B 1 Y ). Since {B 2 Y, B n Y } / E there exists in G a non-chordal cycle containing B n Y, B 1 Y and B 2 Y. According to Theorem 3, H Y is not a hypertree, thus by Theorem 6, H is not a parsimonious cluster system: contradiction. For instance, Fig. 5 shows the Hasse diagram of cluster systems from Fig. 4. There is a crown in Fig. 5 (a) (in bold) which explains why the cluster system of Fig. 5 (a) cannot be a parsimonious cluster system. x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 (a) (b) Figure 5. (a) and (b) are Hasse diagrams of cluster systems from Fig. 4 (a) and (b) respectively. (a) admits a crown (in bold). 16

17 3 Using X-hypertrees 3.1 Computing a X-hypertree To compute a X-hypertree one can use an algorithm shown in Brucker (2001) which computes a chordal quasi-ultrametric from any given dissimilarity d on X in O( X 4 ) operations. For instance, the lower triangular matrix of Tab. 1 is the dissimilarity d used in Bandelt and Dress (1989). It represents the dissimilarities among six hominidæ. The upper triangular matrix of Tab. 1 is the dissimilarity q(d) obtained by the algorithm described in Brucker (2001). Table 1. Dissimilarity d between six Hominidæ(lower triangular matrix) and dissimilarity q(d) (upper triangular matrix). H. sapiens P. paniscus P. troglodytes G. gorilla P. pygmaeus H. lar Computing the dissimilarity clusters of a chordal quasi-ultrametric is an easy task since it is a quasi-ultrametric. The dissimilarity clusters of q(d) are listed in Tab Drawing a X-hypertree In order to draw a parsimonious cluster system, several approaches are possible. The first one is to draw its Hasse diagram. For instance, Fig. 6 is the Hasse diagram of the clusters from Tab. 2. To avoid the non planarity of the Hasse diagram, one can represent the clusters in 3 dimensions by using the spacial dendrograms of Batbedat (1990). Such an approach leads to a 3D representation of the Hasse diagram using the fact that each cluster of a parsimonious cluster system is a connected part of a underlying vertex tree (because a parsimonious cluster systems are 17

18 Table 2. Clusters of the dissimilarity q(d) (upper triangular matrix in Tab. 1). # diameter elements 1 0 H. Lar 2 0 P. Pygmaeus 3 0 H. Sapiens 4 0 P. Troglodites 5 0 P. Paniscus 6 0 G. Gorilla P. Paniscus; P. Troglodites P. Troglodites; H. Sapiens H. Sapiens; P. Paniscus; P. Troglodites P. Troglodites; G. Gorilla P. Paniscus; P. Troglodites; G. Gorilla H. Sapiens; P. Paniscus; P. Troglodites; G. Gorilla H. Sapiens; P. Pygmaeus H. Sapiens; P. Paniscus; P. Troglodites; P. Pygmaeus H. Sapiens; P. Paniscus; P. Troglodites; P. Pygmaeus; G. Gorilla H. Sapiens; P. Paniscus; P. Troglodites; P. Pygmaeus; H. Lar H. Sapiens; P. Paniscus; P. Troglodites; P. Pygmaeus; G. Gorilla; H. Lar hypertrees). Nevertheless for printing purpose, one has to project the 3D view onto a plane, loosing the benefits of this method. We will now show a third method of representing a parsimonious cluster system which divides the clusters in two parts, a hierarchical and an overlapping one. Fig. 7 shows the result for the clusters of Tab. 2. In order to construct these two parts we consider for a given X-hypertree ((X, E), f): for A E\{X} define succs(a) as the elements that cover A. That is: succs(a) = min {B A B, B E}, 18

19 H Lar P Pygmaeus H Sapiens P Troglodites P Paniscus G Gorilla Figure 6. The Hasse diagram of Clusters from Tab H Lar P Pygmaeus H Sapiens P Troglodites P Paniscus G Gorilla H Lar P Pygmaeus H Sapiens P Troglodites P Paniscus G Gorilla Figure 7. Cluster-oriented representation of the clusters from Tab. 2 (left), highlighting the elements from cluster 14 (right). 19

20 for A 2 X denote by sup(a) the smallest element (with respect to the inclusion order) of E which includes (or equals) A. This element exists because E is closed under intersection and X E. The graph G H = (E, E H ) where AB is an edge of E H if and only if B = sup(c) where C is the union of all the elements of succs(a)) is a hierarchical tree rooted in X. This tree is the hierarchical part of Fig. 7 where each element A of E is at height f(a). The overlapping part consists in all the edges of the graph G O = (E, E O ) where AB is an edge of E O if AB / E H and B succs(a). For instance, cluster 14 in Fig. 7 contains all the dashed elements. The cluster 8 is not a singleton, it is composed of H. Sapiens and P. Troglodites, but it s a leaf of the hierarchical tree G H. Those elements are just like replacement of missed elements. Moreover, if the X-hypertree is a strong hierarchy, G O is an empty graph and the hierarchical tree is exactly G H. Considering Fig. 8, interval hypergraphs have both a hierarchical and an overlapping part. The graph G O can be interpreted either as a contamination of the strong hierarchy (elements are similar but in different ways, just like the example given in part 2) or resulting from different rates of evolution in a phylogenetic point of view. This representation can also be interpreted like the reticulograms proposed by Legendre and Makarenkov (2002). 3.3 Leaves of a parsimonious cluster system The above section gives a first way of interpreting a parsimonious cluster system: a contamination of a strong hierarchy by overlapping clusters. This interpretation can nevertheless be used for any cluster system. One particularity of parsimonious cluster systems is the existence of leaves. We introduced them informally in Section 2 as elements in a border. Theorem 12 will formally define them. Recall that a subset C of 2 X is a chain if and only if for A and B in C either A B or B A. 20

21 x y z t u v Figure 8. Cluster-oriented representation of the interval hypergraph from Fig. 2. Theorem 12. Let H = (X, E) be a parsimonious cluster system. There exists an x in X such that the set of all clusters of H containing x is a chain. Such an element is called a leaf. Proof. We will show it using the bijection of Theorem 11. Let H = (X, E) be a parsimonious cluster system and T = (E {φ}, ) its associated dismantlable lattice. We will prove the property by induction on X. For X = 2, the existence of leaf is clearly true. Suppose that for X = i any parsimonious cluster system on X admits a leaf, and let H = (X, E) be a parsimonious cluster system with X = i + 1. According to Therorem 11, the lattice T = (E {φ}, ) is dismantlable thus there exists a doubly irreducible atom in T. Let {x} (x X) be this atom. Since H X\{x} is also a parsimonious cluster system (Theorem 4) it satisfies the induction property and let y be a leaf of H X\{x}. Two cases may then occur. Either y is also a leaf of H and the property holds, or y is not a leaf of H. The second case implies that {x, y} is a cluster of H and since {x} is a doubly irreducible atom of T, any cluster A ( A > 1)of H containing x contains also y, thus A X\{x} is in the chain of clusters of H X\{x} containing y: x is a leaf of H, which concludes the proof. Since the restriction of a X-hypertree is also a X-hypertree it is clear that any parsimonious cluster system can be disembodied by iteratively removing a leaf (restrict the parsimonious cluster 21

22 system to all its elements but the leaf). Leaves of a X-hypertree play the same role as leaves for a tree (or a X-tree). Removing a leaf of a X-hypertree reveals another leaf. Moreover, using the same proof technic as for Theorem 12 one can prove: Theorem 13. Any parsimonious cluster system admits at least 2 leaves. For instance, the parsimonious cluster system of Tab. 2 contains 2 leaves: H. Lar and G. Gorilla. After removing these two leaves P. Pygmaeus and P. Paniscus becomes leaves of the remaining parsimonious cluster system and after removing them the two last elements, P. Troglodites and H. Sapiens, become leaves. Leaves of a parsimonious cluster system has a practical use since they form the border of the cluster system. This idea is stressed by the fact that the restriction of a parsimonious cluster system to the set of its leaves clearly forms a strong hierarchy. Removing the leaves of parsimonious cluster system (border of order 1) reveals an inner border (border of order 2), and so on. In a word, the complexity of the element is measured by the order of the border it belongs to. At last, since the clusters containing a leaf form a chain, one can compare every element to a leaf. The closer in the chain, the closer of the leaf. 4 Conclusions Parsimonious cluster systems and their valued version (X-hypertrees) are axiomatically designed to be a cluster system generalizing phylogenetic trees. They are in bijection with a dissimilarity model whose dissimilarity clusters can be easily computed. Moreover, a polynomial algorithm can approximate any given dissimilarity by a dissimilarity whose clusters form a parsimonious cluster system. Structurally, parsimonious cluster systems are in bijection with the well studied dismantlable lattice family. For a practical point of view, leaves of a parsimonious cluster system can be used like leaves in a tree (as a border containing the other 22

23 elements or using the chain of clusters to compare any element to them). This is ongoing reserach and a lot of work remains to be done for graphically represent those cluster systems. Either using a 3D program to represent them in a 3 dimensional space or by rearranging the clusters in order to provide a planar (or a almost planar ) representation. 5 References Bandelt, H.-J., and Dress, W. M. (1989), Weak hierarchies associated with similarity measures an additive clustering technique. Bulletin of Mathematical Biology, 51, Batbedat, A. (1990), Les approches pyramidales dans la classification arborée, Masson: Paris. Barthélemy, J.-P. and Brucker, F. (2008), Binary clustering, Discrete Applied Mathematics, 156, Buneman, P. (1971) The recovery of trees from measures of dissimilarity, In Kendall D. G., and Tautu P. (eds), Mathematics in Archaeological and Historical Sciences, Edinburgh University Press, Brucker, F. (2001), Modèles de classification en classes empiétantes, PhD Thesis, EHESS: France. Brucker, F. (2005), From hypertrees to arboreal quasi-ultrametrics, Discrete Applied Mathematics, 147, Brucker, F. (2006), Sub-dominant theory in numerical taxonomy, Discrete Applied Mathematics, 154, Brucker, F., and Barthélemy, J.-P. (2007), Eléments de Classification, Hermes Publishing: London. Diatta, J. and Fichet, B. (1994), From Asprejan hierarchies and Bandelt-dress weak hierarchies to quasi-hierarchie, in Diday E., Lechevallier Y., Schader M., Bertrand P. (eds), New Approaches in classification and data analysis, Springer-Verlag, Berlin, Diatta, J. and Fichet, B. (1998), Quasi-ultrametrics and their 2-balls hypergraph, Discrete Mathematics,192,

24 Duchet, P. (1978), Propriétés de Helly et problèmes de représentations, in Problèmes combinatoires et théorie des graphes, Colloques internationaux du CNRS 260, Felsenstein, J. (1983) Numerical Taxonomy, Springer-Verlag: Berlin. Flament, C. (1978), Hypergraphes arborés, Discrete Mathematics, 21, Kelly, D. and Rival, I. (1974) Crowns, fences, and dismantlable lattices, Canad. J. Math., 26, Legendre, P. and Makarenkov, V. (2002) Reconstruction of biogeographical and evolutionary networks using reticulograms, Systematic Biology, 51, Lepouliquen, M. (2008) Filiation de manuscrits sanskrits par méthodes issues, pour partie, de la phylogénétique, Ph. D., EHESS: Paris. Rival, I. (1974) Lattices with doubly irreducible elements, Canadian Mathematical Bulletin, 17, Semple, C., and Steel, M. (2003) Phylogenetics, Oxford University Press: New-York. Seitou, N. and Nei, M. (1987), The neighbor-joining method: a new method for recontruction of phylogenetic tree, Mol. Biol. Evol., 4, Tversky, A. (1977) Features of similarity, Psychological Review, 84,

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Branchwidth of graphic matroids.

Branchwidth of graphic matroids. Branchwidth of graphic matroids. Frédéric Mazoit and Stéphan Thomassé Abstract Answering a question of Geelen, Gerards, Robertson and Whittle [2], we prove that the branchwidth of a bridgeless graph is

More information

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS KT Huber, V Moulton, C Semple, and M Steel Department of Mathematics and Statistics University of Canterbury Private Bag 4800 Christchurch,

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

Partial cubes: structures, characterizations, and constructions

Partial cubes: structures, characterizations, and constructions Partial cubes: structures, characterizations, and constructions Sergei Ovchinnikov San Francisco State University, Mathematics Department, 1600 Holloway Ave., San Francisco, CA 94132 Abstract Partial cubes

More information

Reconstructing Trees from Subtree Weights

Reconstructing Trees from Subtree Weights Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree

More information

D-bounded Distance-Regular Graphs

D-bounded Distance-Regular Graphs D-bounded Distance-Regular Graphs CHIH-WEN WENG 53706 Abstract Let Γ = (X, R) denote a distance-regular graph with diameter D 3 and distance function δ. A (vertex) subgraph X is said to be weak-geodetically

More information

Topology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski

Topology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski Topology, Math 581, Fall 2017 last updated: November 24, 2017 1 Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski Class of August 17: Course and syllabus overview. Topology

More information

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E. Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 14 March 11 Lecturer: Lior Pachter Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar 14.1 Introduction The goal of today s lecture

More information

Some operations preserving the existence of kernels

Some operations preserving the existence of kernels Discrete Mathematics 205 (1999) 211 216 www.elsevier.com/locate/disc Note Some operations preserving the existence of kernels Mostafa Blidia a, Pierre Duchet b, Henry Jacob c,frederic Maray d;, Henry Meyniel

More information

Strongly chordal and chordal bipartite graphs are sandwich monotone

Strongly chordal and chordal bipartite graphs are sandwich monotone Strongly chordal and chordal bipartite graphs are sandwich monotone Pinar Heggernes Federico Mancini Charis Papadopoulos R. Sritharan Abstract A graph class is sandwich monotone if, for every pair of its

More information

Maths 212: Homework Solutions

Maths 212: Homework Solutions Maths 212: Homework Solutions 1. The definition of A ensures that x π for all x A, so π is an upper bound of A. To show it is the least upper bound, suppose x < π and consider two cases. If x < 1, then

More information

Tree sets. Reinhard Diestel

Tree sets. Reinhard Diestel 1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked

More information

REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

More information

BIOINFORMATICS APPLICATIONS NOTE

BIOINFORMATICS APPLICATIONS NOTE BIOINFORMATICS APPLICATIONS NOTE Vol. 17 no. 7 2001 Pages 664 668 T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks Vladimir Makarenkov Département de Sciences Biologiques,

More information

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS PETER J. HUMPHRIES AND CHARLES SEMPLE Abstract. For two rooted phylogenetic trees T and T, the rooted subtree prune and regraft distance

More information

ANSWER TO A QUESTION BY BURR AND ERDŐS ON RESTRICTED ADDITION, AND RELATED RESULTS Mathematics Subject Classification: 11B05, 11B13, 11P99

ANSWER TO A QUESTION BY BURR AND ERDŐS ON RESTRICTED ADDITION, AND RELATED RESULTS Mathematics Subject Classification: 11B05, 11B13, 11P99 ANSWER TO A QUESTION BY BURR AND ERDŐS ON RESTRICTED ADDITION, AND RELATED RESULTS N. HEGYVÁRI, F. HENNECART AND A. PLAGNE Abstract. We study the gaps in the sequence of sums of h pairwise distinct elements

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Set, functions and Euclidean space. Seungjin Han

Set, functions and Euclidean space. Seungjin Han Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,

More information

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

On the consensus of closure systems

On the consensus of closure systems On the consensus of closure systems Bruno LECLERC École des Hautes Études en Sciences Sociales Centre d'analyse et de Mathématique Sociales (UMR 8557) 54 bd Raspail, F-75270 Paris cedex 06, France leclerc@ehess.fr

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 5 CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC

More information

HOW IS A CHORDAL GRAPH LIKE A SUPERSOLVABLE BINARY MATROID?

HOW IS A CHORDAL GRAPH LIKE A SUPERSOLVABLE BINARY MATROID? HOW IS A CHORDAL GRAPH LIKE A SUPERSOLVABLE BINARY MATROID? RAUL CORDOVIL, DAVID FORGE AND SULAMITA KLEIN To the memory of Claude Berge Abstract. Let G be a finite simple graph. From the pioneering work

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Thus, X is connected by Problem 4. Case 3: X = (a, b]. This case is analogous to Case 2. Case 4: X = (a, b). Choose ε < b a

Thus, X is connected by Problem 4. Case 3: X = (a, b]. This case is analogous to Case 2. Case 4: X = (a, b). Choose ε < b a Solutions to Homework #6 1. Complete the proof of the backwards direction of Theorem 12.2 from class (which asserts the any interval in R is connected). Solution: Let X R be a closed interval. Case 1:

More information

Math 541 Fall 2008 Connectivity Transition from Math 453/503 to Math 541 Ross E. Staffeldt-August 2008

Math 541 Fall 2008 Connectivity Transition from Math 453/503 to Math 541 Ross E. Staffeldt-August 2008 Math 541 Fall 2008 Connectivity Transition from Math 453/503 to Math 541 Ross E. Staffeldt-August 2008 Closed sets We have been operating at a fundamental level at which a topological space is a set together

More information

K-center Hardness and Max-Coverage (Greedy)

K-center Hardness and Max-Coverage (Greedy) IOE 691: Approximation Algorithms Date: 01/11/2017 Lecture Notes: -center Hardness and Max-Coverage (Greedy) Instructor: Viswanath Nagarajan Scribe: Sentao Miao 1 Overview In this lecture, we will talk

More information

2 Metric Spaces Definitions Exotic Examples... 3

2 Metric Spaces Definitions Exotic Examples... 3 Contents 1 Vector Spaces and Norms 1 2 Metric Spaces 2 2.1 Definitions.......................................... 2 2.2 Exotic Examples...................................... 3 3 Topologies 4 3.1 Open Sets..........................................

More information

arxiv: v5 [q-bio.pe] 24 Oct 2016

arxiv: v5 [q-bio.pe] 24 Oct 2016 On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks Christopher Bryant a, Mareike Fischer b, Simone Linz c, Charles Semple d arxiv:1505.06898v5 [q-bio.pe] 24 Oct 2016 a Statistics

More information

arxiv: v1 [cs.dm] 26 Apr 2010

arxiv: v1 [cs.dm] 26 Apr 2010 A Simple Polynomial Algorithm for the Longest Path Problem on Cocomparability Graphs George B. Mertzios Derek G. Corneil arxiv:1004.4560v1 [cs.dm] 26 Apr 2010 Abstract Given a graph G, the longest path

More information

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES SIMONE LINZ AND CHARLES SEMPLE Abstract. Calculating the rooted subtree prune and regraft (rspr) distance between two rooted binary

More information

Graph structure in polynomial systems: chordal networks

Graph structure in polynomial systems: chordal networks Graph structure in polynomial systems: chordal networks Pablo A. Parrilo Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute of Technology

More information

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES MAGNUS BORDEWICH 1, CATHERINE MCCARTIN 2, AND CHARLES SEMPLE 3 Abstract. In this paper, we give a (polynomial-time) 3-approximation

More information

Distance between two k-sets and Path-Systems Extendibility

Distance between two k-sets and Path-Systems Extendibility Distance between two k-sets and Path-Systems Extendibility December 2, 2003 Ronald J. Gould (Emory University), Thor C. Whalen (Metron, Inc.) Abstract Given a simple graph G on n vertices, let σ 2 (G)

More information

Group construction in geometric C-minimal non-trivial structures.

Group construction in geometric C-minimal non-trivial structures. Group construction in geometric C-minimal non-trivial structures. Françoise Delon, Fares Maalouf January 14, 2013 Abstract We show for some geometric C-minimal structures that they define infinite C-minimal

More information

Economics 204 Fall 2011 Problem Set 1 Suggested Solutions

Economics 204 Fall 2011 Problem Set 1 Suggested Solutions Economics 204 Fall 2011 Problem Set 1 Suggested Solutions 1. Suppose k is a positive integer. Use induction to prove the following two statements. (a) For all n N 0, the inequality (k 2 + n)! k 2n holds.

More information

By (a), B ε (x) is a closed subset (which

By (a), B ε (x) is a closed subset (which Solutions to Homework #3. 1. Given a metric space (X, d), a point x X and ε > 0, define B ε (x) = {y X : d(y, x) ε}, called the closed ball of radius ε centered at x. (a) Prove that B ε (x) is always a

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

2. Metric Spaces. 2.1 Definitions etc.

2. Metric Spaces. 2.1 Definitions etc. 2. Metric Spaces 2.1 Definitions etc. The procedure in Section for regarding R as a topological space may be generalized to many other sets in which there is some kind of distance (formally, sets with

More information

Chapter 2 Metric Spaces

Chapter 2 Metric Spaces Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics

More information

arxiv: v1 [cs.dm] 29 Oct 2012

arxiv: v1 [cs.dm] 29 Oct 2012 arxiv:1210.7684v1 [cs.dm] 29 Oct 2012 Square-Root Finding Problem In Graphs, A Complete Dichotomy Theorem. Babak Farzad 1 and Majid Karimi 2 Department of Mathematics Brock University, St. Catharines,

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Sets and Motivation for Boolean algebra

Sets and Motivation for Boolean algebra SET THEORY Basic concepts Notations Subset Algebra of sets The power set Ordered pairs and Cartesian product Relations on sets Types of relations and their properties Relational matrix and the graph of

More information

Closure operators on sets and algebraic lattices

Closure operators on sets and algebraic lattices Closure operators on sets and algebraic lattices Sergiu Rudeanu University of Bucharest Romania Closure operators are abundant in mathematics; here are a few examples. Given an algebraic structure, such

More information

MATHEMATICAL CONCEPTS OF EVOLUTION ALGEBRAS IN NON-MENDELIAN GENETICS

MATHEMATICAL CONCEPTS OF EVOLUTION ALGEBRAS IN NON-MENDELIAN GENETICS MATHEMATICAL CONCEPTS OF EVOLUTION ALGEBRAS IN NON-MENDELIAN GENETICS JIANJUN PAUL TIAN AND PETR VOJTĚCHOVSKÝ Abstract. Evolution algebras are not necessarily associative algebras satisfying e i e j =

More information

Phylogenetic Algebraic Geometry

Phylogenetic Algebraic Geometry Phylogenetic Algebraic Geometry Seth Sullivant North Carolina State University January 4, 2012 Seth Sullivant (NCSU) Phylogenetic Algebraic Geometry January 4, 2012 1 / 28 Phylogenetics Problem Given a

More information

The Strong Largeur d Arborescence

The Strong Largeur d Arborescence The Strong Largeur d Arborescence Rik Steenkamp (5887321) November 12, 2013 Master Thesis Supervisor: prof.dr. Monique Laurent Local Supervisor: prof.dr. Alexander Schrijver KdV Institute for Mathematics

More information

MATH 102 INTRODUCTION TO MATHEMATICAL ANALYSIS. 1. Some Fundamentals

MATH 102 INTRODUCTION TO MATHEMATICAL ANALYSIS. 1. Some Fundamentals MATH 02 INTRODUCTION TO MATHEMATICAL ANALYSIS Properties of Real Numbers Some Fundamentals The whole course will be based entirely on the study of sequence of numbers and functions defined on the real

More information

arxiv: v1 [math.co] 5 May 2016

arxiv: v1 [math.co] 5 May 2016 Uniform hypergraphs and dominating sets of graphs arxiv:60.078v [math.co] May 06 Jaume Martí-Farré Mercè Mora José Luis Ruiz Departament de Matemàtiques Universitat Politècnica de Catalunya Spain {jaume.marti,merce.mora,jose.luis.ruiz}@upc.edu

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

Chordal Graphs, Interval Graphs, and wqo

Chordal Graphs, Interval Graphs, and wqo Chordal Graphs, Interval Graphs, and wqo Guoli Ding DEPARTMENT OF MATHEMATICS LOUISIANA STATE UNIVERSITY BATON ROUGE, LA 70803-4918 E-mail: ding@math.lsu.edu Received July 29, 1997 Abstract: Let be the

More information

Metric Spaces Math 413 Honors Project

Metric Spaces Math 413 Honors Project Metric Spaces Math 413 Honors Project 1 Metric Spaces Definition 1.1 Let X be a set. A metric on X is a function d : X X R such that for all x, y, z X: i) d(x, y) = d(y, x); ii) d(x, y) = 0 if and only

More information

The Lefthanded Local Lemma characterizes chordal dependency graphs

The Lefthanded Local Lemma characterizes chordal dependency graphs The Lefthanded Local Lemma characterizes chordal dependency graphs Wesley Pegden March 30, 2012 Abstract Shearer gave a general theorem characterizing the family L of dependency graphs labeled with probabilities

More information

Faithful embedding on finite orders classes

Faithful embedding on finite orders classes Faithful embedding on finite orders classes Alain Guillet Jimmy Leblet Jean-Xavier Rampon Abstract We investigate, in the particular case of finite orders classes, the notion of faithful embedding among

More information

ZEROS OF SPARSE POLYNOMIALS OVER LOCAL FIELDS OF CHARACTERISTIC p

ZEROS OF SPARSE POLYNOMIALS OVER LOCAL FIELDS OF CHARACTERISTIC p ZEROS OF SPARSE POLYNOMIALS OVER LOCAL FIELDS OF CHARACTERISTIC p BJORN POONEN 1. Statement of results Let K be a field of characteristic p > 0 equipped with a valuation v : K G taking values in an ordered

More information

VERY STABLE BUNDLES AND PROPERNESS OF THE HITCHIN MAP

VERY STABLE BUNDLES AND PROPERNESS OF THE HITCHIN MAP VERY STABLE BUNDLES AND PROPERNESS OF THE HITCHIN MAP CHRISTIAN PAULY AND ANA PEÓN-NIETO Abstract. Let X be a smooth complex projective curve of genus g 2 and let K be its canonical bundle. In this note

More information

A NEW SET THEORY FOR ANALYSIS

A NEW SET THEORY FOR ANALYSIS Article A NEW SET THEORY FOR ANALYSIS Juan Pablo Ramírez 0000-0002-4912-2952 Abstract: We present the real number system as a generalization of the natural numbers. First, we prove the co-finite topology,

More information

Week 3: Faces of convex sets

Week 3: Faces of convex sets Week 3: Faces of convex sets Conic Optimisation MATH515 Semester 018 Vera Roshchina School of Mathematics and Statistics, UNSW August 9, 018 Contents 1. Faces of convex sets 1. Minkowski theorem 3 3. Minimal

More information

Chapter 8. P-adic numbers. 8.1 Absolute values

Chapter 8. P-adic numbers. 8.1 Absolute values Chapter 8 P-adic numbers Literature: N. Koblitz, p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd edition, Graduate Texts in Mathematics 58, Springer Verlag 1984, corrected 2nd printing 1996, Chap.

More information

Math 4606, Summer 2004: Inductive sets, N, the Peano Axioms, Recursive Sequences Page 1 of 10

Math 4606, Summer 2004: Inductive sets, N, the Peano Axioms, Recursive Sequences Page 1 of 10 Math 4606, Summer 2004: Inductive sets, N, the Peano Axioms, Recursive Sequences Page 1 of 10 Inductive sets (used to define the natural numbers as a subset of R) (1) Definition: A set S R is an inductive

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Even Cycles in Hypergraphs.

Even Cycles in Hypergraphs. Even Cycles in Hypergraphs. Alexandr Kostochka Jacques Verstraëte Abstract A cycle in a hypergraph A is an alternating cyclic sequence A 0, v 0, A 1, v 1,..., A k 1, v k 1, A 0 of distinct edges A i and

More information

Automata on linear orderings

Automata on linear orderings Automata on linear orderings Véronique Bruyère Institut d Informatique Université de Mons-Hainaut Olivier Carton LIAFA Université Paris 7 September 25, 2006 Abstract We consider words indexed by linear

More information

A new algorithm to construct phylogenetic networks from trees

A new algorithm to construct phylogenetic networks from trees A new algorithm to construct phylogenetic networks from trees J. Wang College of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia, China Corresponding author: J. Wang E-mail: wangjuanangle@hit.edu.cn

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Downloaded 03/01/17 to Redistribution subject to SIAM license or copyright; see

Downloaded 03/01/17 to Redistribution subject to SIAM license or copyright; see SIAM J. DISCRETE MATH. Vol. 31, No. 1, pp. 335 382 c 2017 Society for Industrial and Applied Mathematics PARTITION CONSTRAINED COVERING OF A SYMMETRIC CROSSING SUPERMODULAR FUNCTION BY A GRAPH ATTILA BERNÁTH,

More information

Reconstruction of certain phylogenetic networks from their tree-average distances

Reconstruction of certain phylogenetic networks from their tree-average distances Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,

More information

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Section Summary. Relations and Functions Properties of Relations. Combining Relations Chapter 9 Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations Closures of Relations (not currently included

More information

Graph structure in polynomial systems: chordal networks

Graph structure in polynomial systems: chordal networks Graph structure in polynomial systems: chordal networks Pablo A. Parrilo Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute of Technology

More information

2. Introduction to commutative rings (continued)

2. Introduction to commutative rings (continued) 2. Introduction to commutative rings (continued) 2.1. New examples of commutative rings. Recall that in the first lecture we defined the notions of commutative rings and field and gave some examples of

More information

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

More information

Appendix A. Definitions for Ordered Sets. The appendices contain all the formal definitions, propositions and proofs for

Appendix A. Definitions for Ordered Sets. The appendices contain all the formal definitions, propositions and proofs for 161 Appendix A Definitions for Ordered Sets The appendices contain all the formal definitions, propositions and proofs for developing a model of the display process based on lattices. Here we list some

More information

2. Prime and Maximal Ideals

2. Prime and Maximal Ideals 18 Andreas Gathmann 2. Prime and Maximal Ideals There are two special kinds of ideals that are of particular importance, both algebraically and geometrically: the so-called prime and maximal ideals. Let

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph MATHEMATICAL ENGINEERING TECHNICAL REPORTS Boundary cliques, clique trees and perfect sequences of maximal cliques of a chordal graph Hisayuki HARA and Akimichi TAKEMURA METR 2006 41 July 2006 DEPARTMENT

More information

Systems of sets such that each set properly intersects at most one other set Application to cluster analysis

Systems of sets such that each set properly intersects at most one other set Application to cluster analysis Discrete Applied Mathematics 156 (2008) 1220 1236 www.elsevier.com/locate/dam Systems of sets such that each set properly intersects at most one other set Application to cluster analysis P. Bertrand ENST

More information

Lattices, closure operators, and Galois connections.

Lattices, closure operators, and Galois connections. 125 Chapter 5. Lattices, closure operators, and Galois connections. 5.1. Semilattices and lattices. Many of the partially ordered sets P we have seen have a further valuable property: that for any two

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

LATTICE BASIS AND ENTROPY

LATTICE BASIS AND ENTROPY LATTICE BASIS AND ENTROPY Vinod Kumar.P.B 1 and K.Babu Joseph 2 Dept. of Mathematics Rajagiri School of Engineering & Technology Rajagiri Valley.P.O, Cochin 682039 Kerala, India. ABSTRACT: We introduce

More information

Partitions versus sets : a case of duality

Partitions versus sets : a case of duality Partitions versus sets : a case of duality Laurent Lyaudet Université d Orléans - LIFO, Rue Léonard de Vinci, B.P. 6759, F-45067 Orléans Cedex 2, France Frédéric Mazoit 1 Université Bordeaux - LaBRI, 351,

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Enumerating minimal connected dominating sets in graphs of bounded chordality,

Enumerating minimal connected dominating sets in graphs of bounded chordality, Enumerating minimal connected dominating sets in graphs of bounded chordality, Petr A. Golovach a,, Pinar Heggernes a, Dieter Kratsch b a Department of Informatics, University of Bergen, N-5020 Bergen,

More information

Markov and Gibbs Random Fields

Markov and Gibbs Random Fields Markov and Gibbs Random Fields Bruno Galerne bruno.galerne@parisdescartes.fr MAP5, Université Paris Descartes Master MVA Cours Méthodes stochastiques pour l analyse d images Lundi 6 mars 2017 Outline The

More information

On the S-Labeling problem

On the S-Labeling problem On the S-Labeling problem Guillaume Fertin Laboratoire d Informatique de Nantes-Atlantique (LINA), UMR CNRS 6241 Université de Nantes, 2 rue de la Houssinière, 4422 Nantes Cedex - France guillaume.fertin@univ-nantes.fr

More information

2 Lecture 2: Logical statements and proof by contradiction Lecture 10: More on Permutations, Group Homomorphisms 31

2 Lecture 2: Logical statements and proof by contradiction Lecture 10: More on Permutations, Group Homomorphisms 31 Contents 1 Lecture 1: Introduction 2 2 Lecture 2: Logical statements and proof by contradiction 7 3 Lecture 3: Induction and Well-Ordering Principle 11 4 Lecture 4: Definition of a Group and examples 15

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Chapter focus Shifting from the process of how evolution works to the pattern evolution produces over time. Phylogeny Phylon = tribe, geny = genesis or origin

More information

NOTES ON DIOPHANTINE APPROXIMATION

NOTES ON DIOPHANTINE APPROXIMATION NOTES ON DIOPHANTINE APPROXIMATION Jan-Hendrik Evertse January 29, 200 9 p-adic Numbers Literature: N. Koblitz, p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd edition, Graduate Texts in Mathematics

More information

Generalized Pigeonhole Properties of Graphs and Oriented Graphs

Generalized Pigeonhole Properties of Graphs and Oriented Graphs Europ. J. Combinatorics (2002) 23, 257 274 doi:10.1006/eujc.2002.0574 Available online at http://www.idealibrary.com on Generalized Pigeonhole Properties of Graphs and Oriented Graphs ANTHONY BONATO, PETER

More information

Math 455 Some notes on Cardinality and Transfinite Induction

Math 455 Some notes on Cardinality and Transfinite Induction Math 455 Some notes on Cardinality and Transfinite Induction (David Ross, UH-Manoa Dept. of Mathematics) 1 Cardinality Recall the following notions: function, relation, one-to-one, onto, on-to-one correspondence,

More information

Proof Techniques (Review of Math 271)

Proof Techniques (Review of Math 271) Chapter 2 Proof Techniques (Review of Math 271) 2.1 Overview This chapter reviews proof techniques that were probably introduced in Math 271 and that may also have been used in a different way in Phil

More information

The Turán number of sparse spanning graphs

The Turán number of sparse spanning graphs The Turán number of sparse spanning graphs Noga Alon Raphael Yuster Abstract For a graph H, the extremal number ex(n, H) is the maximum number of edges in a graph of order n not containing a subgraph isomorphic

More information

On minimal models of the Region Connection Calculus

On minimal models of the Region Connection Calculus Fundamenta Informaticae 69 (2006) 1 20 1 IOS Press On minimal models of the Region Connection Calculus Lirong Xia State Key Laboratory of Intelligent Technology and Systems Department of Computer Science

More information

Introduction to generalized topological spaces

Introduction to generalized topological spaces @ Applied General Topology c Universidad Politécnica de Valencia Volume 12, no. 1, 2011 pp. 49-66 Introduction to generalized topological spaces Irina Zvina Abstract We introduce the notion of generalized

More information

CLASS NOTES FOR APRIL 14, 2000

CLASS NOTES FOR APRIL 14, 2000 CLASS NOTES FOR APRIL 14, 2000 Announcement: Section 1.2, Questions 3,5 have been deferred from Assignment 1 to Assignment 2. Section 1.4, Question 5 has been dropped entirely. 1. Review of Wednesday class

More information

On Torsion-by-Nilpotent Groups

On Torsion-by-Nilpotent Groups On Torsion-by-Nilpotent roups. Endimioni C.M.I. - Université de Provence, UMR-CNRS 6632, 39, rue F. Joliot-Curie, 13453 Marseille Cedex 13, France E-mail: endimion@gyptis.univ-mrs.fr and. Traustason Department

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

Connectivity and tree structure in finite graphs arxiv: v5 [math.co] 1 Sep 2014

Connectivity and tree structure in finite graphs arxiv: v5 [math.co] 1 Sep 2014 Connectivity and tree structure in finite graphs arxiv:1105.1611v5 [math.co] 1 Sep 2014 J. Carmesin R. Diestel F. Hundertmark M. Stein 20 March, 2013 Abstract Considering systems of separations in a graph

More information

arxiv: v1 [cs.ds] 1 Nov 2018

arxiv: v1 [cs.ds] 1 Nov 2018 An O(nlogn) time Algorithm for computing the Path-length Distance between Trees arxiv:1811.00619v1 [cs.ds] 1 Nov 2018 David Bryant Celine Scornavacca November 5, 2018 Abstract Tree comparison metrics have

More information