Parsimonious cluster systems - PDF Free Download

Parsimonious cluster systems François Brucker (1) and Alain Gély (2) (1) Laboratoire LIF, UMR 6166, Centre de Mathématiques et Informatique 39 rue Joliot-Curie - F-13453 Marseille Cedex13. (2) LITA - Paul Verlaine University, Ile du Saulcy BP 80794 57012 Metz cedex, France email: alain.gely@univ-metz.fr Abstract. We introduce in this paper a new clustering structure, parsimonious cluster systems, which generalizes phylogenetic trees. We characterize it as the set of hypertrees stable under restriction and prove that this set is in bijection with a known dissimilarity model: chordal quasi-ultrametrics. We then present one possible way to graphically represent elements of this model. Keywords: overlapping clustering, parsimony, phylogenetic trees, dissimilarities. 1 Introduction In systematic biology, to infer the evolutionary history between DNA sequences or animal species one usually assumes that this history is parsimonious. In some sense, the parsimony principle is the biological formulation of the maximum likelihood approach in statistics or the energy minimization principle in physics. In a word, the simpler the better. Thus: Every species has only one immediate ancestor. It is simpler to conserve a species than to recreate it. The above two remarks lead to the fact that the graph which links a species to its direct ancestor is a tree (the phylogenetic or evolution tree), the eldest ancestor, when it exists, being named the root (see for instance Semple and Steel, 2003).

The parsimony principle associated with phylogenetic trees is a good model for a lot of situations in fields other than just biology, for instance in psychology (Tversky, 1977), in text-mining (Lepouliquen, 2008) or more generally in all fields where the studied objects (e.g. DNA) or parts of them (e.g. genes) are in some way transmitted, combined, or modified during time. Definition 1. A dissimilarity d on a set X is a symmetric function from X X to the set of non-negative real number for which d(x, y) = 0 if x = y. A proper dissimilarity d on X is a dissimilarity for which d(x, y) = 0 if and only if x = y. Since we will only speak about proper dissimilarities, we will consider that a dissimilarity is always proper. A distance on X is a dissimilarity such that for all x, y, z X d(x, z) d(x, y) + d(y, z) (triangular inequality). In this paper the finite set X will denote the set of objects studied, usually the species existing today in phylogenetics. In a phylogenetic tree the vertices are then the elements of X and their common ancestors, denoted as latent vertices (see for instance Felsenstein, 1983). When the edges of such a tree are numerically valued by a difference between the two corresponding linked vertices (the smaller the value the closer the vertices), the distance between two arbitrary vertices computed by summing the values of the edges of the path joining them is a tree distance. A valued phylogenetic tree is usually named X-tree. A well known result (Buneman, 1971, among others) shows that tree distances and X-trees are in bijection. To each distance d on X satisfying the property of Theorem 1 below can be associated a unique X-tree for which the sum of the values of the edges in the path between two elements of X is equal to d. Here the latent vertices of the tree (ancestors in phylogeny) can here be considered to have been introduced in order to preserve the distance property (evolution history). Theorem 1 (Buneman, 1971 (among others)). A distance d on X is a tree distance if and only if for all x, y, z, t X: d(x, y) + d(z, t) max{d(x, z) + d(y, t), d(x, t) + d(y, z)} (1) 2

One of the benefits of dealing with objects described by a dissimilarity is that one can find groups of similar elements by considering dissimilarity clusters. Definition 2. Let d be a dissimilarity on X. The diameter for d of a set A X, written diam d (A), is the largest dissimilarity between two elements of this set. Definition 3. A dissimilarity cluster for a dissimilarity d on X, is a subset A of X with diam d (A) < min{diam d (A {x}) x X\A}. Dissimilarity clusters are a generalization of maximal cliques in a graph. Definition 4. A threshold graph at level α for a dissimilarity d on X is the graph G α = (X, E α ) where xy E α if and only if d(x, y) α. Indeed, if d is a dissimilarity on X, the dissimilarity clusters of d are exactly the maximal cliques of all the threshold graphs of d. A given dissimilarity cluster A of some dissimilarity d on X has then two properties: external isolation, because it is a maximal clique of the graph G diamd (A), an internal cohesion measure, because the dissimilarity between any pair of elements in A is less than diam d (A). The smaller the diameter the greater the internal cohesion. Definition 5. A distance d on X is an ultrametric if for all x, y, z X: d(x, z) max{d(x, y), d(y, z)} (2) Definition 6. A strong hierarchy E on X is a subset of 2 X such that for all A, B E: A B {φ, A, B}. When a tree distance d on X is also an ultrametric (ultrametric are distances whose clusters form an strong hierarchy), the latent vertices of its associated X-tree are exactly the dissimilarity clusters of d. Due to the so called long-branch issue, this is no more the 3

case in general. For instance, the tree distance on X = {x, y, z, t} associated with the valued tree of Fig. 1 (a) is an ultrametric and its dissimilarity clusters are {{x}, {y}, {z}, {t}, {x, y}, {z, t}, {x, y, z, t}}. This is no more true for the tree distance associated with the tree of Fig. 1 (b), even if the tree shape is the same, its dissimilarity clusters are {{x}, {y}, {z}, {t}, {z, t}, {x, z, t}, {y, z, t}, {x, y, z, t}}: the latent vertex u does not correspond to the cluster {x, y} anymore. The long branches ux and uy have drastically changed the dissimilarity clusters. x 1 1 z x 3 1 z 1 1 1 3 u 1 1 y (a) t y (b) t Figure 1. Long branches issue. We propose to directly use the concept of parsimony for defining a clustering structure adapted to a phylogeny instead of inferring clusters from latent nodes. This paper is organized as follows. First, we formally introduce a new clustering model called parsimonious cluster system. Then we characterize it, and show that there are bijections between parsimonious cluster systems, a known dissimilarity structure (chordal quasi-ultrametrics) and a known lattice structure (dismantlable lattices). Finally, we use a known algorithm which computes a chordal quasi-ultrametric with a short example, and propose a possible graphical representation of a given parsimonious cluster system. 2 Parsimonious cluster system In classification, clusters comprise objects of a set X which have something in common. Before precisely defining the notion, we will use it to motivate the concept of parsimonious cluster systems. 4

Suppose that a species S 1 with characteristics (u, v, w) (among others) evolves in a first species S 2 with characteristics (U, v, w) and a second one S 3 keeping the three characteristics of its ancestor. Then, species S 2 evolves in species S 4 and S 5 having (U, V, w) and (U, v, w) as characteristics, respectively. Thus, considering only the living species: Animals of species S 4 and S 5 share the characteristics U and w, Animals of species S 3 and S 5 share the characteristics v and w, The characteristic w shared by S 3 and S 4 also belongs to S 5. Species S 3 and S 4 represent a border of this phylogeny because S 5 shares only some property of them: properties shared by two elements of the border of a phylogeny are also shared by elements which are between them. This can be stated as a cluster property when grouping elements which share one or more properties. Let E 2 X be a set of clusters. For all x 1, x 2, x 3 X: If there exists a cluster A 1 E, for which x 1, x 2 A 1 and x 3 / A 1 (animals sharing property U and w), and if there exists a cluster A 2 E for which x 2, x 3 A 2 and x 1 / A 2 (animals sharing properties v and w), then for any cluster A 3 E such that x 1, x 3 C (animals in C share at least one property with animals x 1 and x 3 ), we also have x 2 C (animals sharing property w). To be a parsimonious cluster system, we postulate that this must be also true for an arbitrary number of clusters: if there exist p > 1 clusters A i X (1 i p) and p + 1 objects x i X (1 i p + 1) for which, for all 1 i p, we have x i, x i+1 A i and for all 1 < i < p, we have x j / A i for j = i 1, i + 2, x 3 / A 1 and if, moreover, x p 1 / A p then any cluster A p+1 containing x 1 and x p+1 also contains x j for 1 j p + 1. That is, if there exist p + 1 animals x 1,..., x p+1 such that for any given i (1 < i < p+1), x i shares only parts of properties from x i 1 and x i+1 (there exist two property clusters, one containing x i 1 and x i but not x i+1 and the other containing x i and x i+1 but 5

not x i 1 ) then all the properties both shared by x 1 and x p+1 are also shared by all the x j (1 j p + 1). The species x i 1 and x i+1 can be seen as a local border (the condition with 3 clusters) which is extended to a global border containing x 1 and x p+1 : it is simpler to conserve the properties shared by a border for animals between them than to create new ones. A parsimonious cluster system is then a cluster system satisfying the above condition. There are several ways of defining a cluster system. We chose the following one: Definition 7. A cluster system H on X is couple (X, E) such that E is a subset of 2 X for which: φ / E, {x} E for all x X, X E, A, B E and A B φ implies A B E. Elements of E are said to be clusters of H. In its full generality the last condition, the closure, is not really needed for defining a cluster system but since we will only speak about subsets of 2 X closed under finite intersection, we decided to add it in the definition. A formal definition of parsimonious cluster system is then: Definition 8. A parsimonious cluster system H = (X, E) is a cluster system such that for all p > 1 if there exist p clusters A i X (1 i p) and p + 1 objects x i X (1 i p + 1) for which for all 1 i p: x i, x i+1 A i and for all 1 < i < p: x j / A i for j = i 1, i + 2, x 3 / A 1 and x p 1 / A p then any cluster A p+1 containing x 1 and x p+1 also contains x j for 1 j p + 1. For p = 2, the above condition is equivalent to the fact that for any three clusters A, B and C in H, we have A B C {A B, B C, A C}. This condition is known in overlapping clustering as the weak-hierarchy condition (Bandelt and Dress, 1989). Parsimonious cluster systems are then a special case of weak-hierarchical cluster systems. This allows us to use the formalism of binary clustering (Barthélemy and Brucker, 2008) in 6

order to define precisely the notions of external isolation and internal cohesion for clusters of a parsimonious cluster system. It is clear that strong hierarchies and interval hypergraphs (cluster systems on X for which each cluster is an interval of a given linear order on X) are parsimonious cluster systems. For instance, Fig. 2 shows the Hasse diagram of an interval hypergraph for the linear order x < y < z < t < u < v. 10 12 8 11 9 5 6 7 1 2 3 4 x y z t u v Figure 2. Interval hypergraphs are parsimonious cluster systems. Definition 9. A binary dissimilarity δ on X is symmetric function from X X to 2 X such that: x, y δ(x, y) for all x, y X, z, t δ(x, y) implies δ(z, t) δ(x, y) for all x, y X, δ(x, x) = {x} for all x X, there exist x, y X such that δ(x, y) = X. Binary dissimilarities are a general tool for manipulating cluster systems. In our case, we will use the bijection between weakhierarchical cluster system H = (X, E) and a part of binary dissimilarity on X: Theorem 2 (Barthélemy and Brucker, 2008). A weak-hierarchical binary dissimilarity on X is a binary dissimilarity δ on X 7

such that for all x 1, x 2, x 3 X: if x 3 / δ(x 1, x 2 ) and x 1 / δ(x 2, x 3 ) then x 2 δ(x 1, x 3 ). There is a one-to-one correspondence between the set weakhierarchy cluster systems on X and the set of all weak-hierarchical binary dissimilarity on X Let H = (X, E) be a weak-hierarchical cluster system. The mapping of Theorem 2 associates to H a weak-hierarchical binary dissimilarity δ H such that, for all x, y X: δ H (x, y) = min{a x, y A; A E} For instance consider the dendrogram of Fig. 3 representing a strong hierarchy (which is also a weak-hierarchical cluster system). Considered as a binary dissimilarity, we have: the cluster 1: {x, y} = δ(x, y), the cluster 2: {x, y, z} = δ(x, z) = δ(y, z), the cluster 3: {u, v} = δ(u, v), the cluster 4: {x, y, z, t, u, v} = δ(x, t) = δ(x, u) = δ(x, v) = δ(y, t) = δ(y, u) = δ(y, v) = δ(z, t) = δ(z, u) = δ(z, v) = δ(t, u) = δ(t, v). 1 2 4 3 x y z t u v Figure 3. A strong hierarchy. A parsimonious binary dissimilarity, can then be defined as follows: 8

Definition 10. A parsimonious binary dissimilarity on X is a binary dissimilarity δ on X for which for all sequence x 1, x 2,..., x p X, x i 1 / δ(x i, x i+1 ) and x i+1 / δ(x i 1, x i ) for any 1 < i < p imply that x i δ(x 1, x p ) for all 1 i p. It is clear from the definition of a binary dissimilarity and by Theorem 2 that parsimonious cluster systems on X are in bijection with parsimonious binary dissimilarities. This bijection can provide a preciese specification of the notions of parsimony, external isolation and internal cohesion for clusters. The term parsimonious in parsimonious cluster systems is motivated by the fact that if we already have δ(x 1, x 2 ), δ(x 2, x 3 ),...,δ(x p 1, x p ) it is simpler to assume that δ(x 1, x p ) contains the path δ(x i, x i+1 ) (for 1 i p) from x 1 to x p than to create a new cluster. Moreover, let H = (X, E) be a parsimonious cluster system on X. Each particular cluster A E is generated by two elements x and y (δ H (x, y) = A) just like an ancestor is defined as a predecessor of two new species. At last: for any z, t A, δ H (z, t) A (internal cohesion) and for any z / A, either A δ H (x, z) or A δ H (y, z) (external isolation). The next sections will fully characterize parsimonious cluster systems as a particular cluster system, as a particular dissimilarity, and as a particular lattice, respectively. 2.1 Cluster related characterization We will prove in this section that parsimonious cluster systems are exactly the hypertrees stable under restriction. Definition 11. A hypertree T = (X, E) is a cluster system for which there exists an underlying vertex tree T = (X, E) with edge set E where all clusters of T are connected parts of T. Theorem 3 (Duchet, 1978; Flament, 1978). A cluster system H = (X, E) is a hypertree if: whenever A 1,..., A p E and A i A j φ for all 1 i < j p then A 1 A p φ (Helly property), 9

the graph G = (E, E), where {A, B} E if and only if A B φ, is chordal. Note that a graph G = (X, E) is said to be chordal if for any p > 2 and x 1,..., x p X it holds: x i x i+1 E for 1 i < p and x 1 x p E imply that there exists (i, j ) with 1 i < j + 1 p such that x i x j E. The first condition of the above theorem is always satisfied by weak-hierarchical cluster systems, thus by parsimonious cluster systems. Moreover, since parsimonious cluster systems are in bijection with parsimonious binary dissimilarities it is clear using the second axiom of Def. 9 and the definition of a parsimonious binary dissimilarity that the second condition is also satisfied. Thus, a parsimonious cluster system is a hypertree. Let H = (X, E) be a cluster system. The restriction of H to a subset Y X is defined as H Y = (Y, E Y ) with E Y = {A Y A E; A Y φ}. For all Y X, it is clear that H Y is also a cluster system. Theorem 4 shows that the notion of parsimonious cluster system is stable under restriction. Theorem 4. Let T = (X, E) be a parsimonious cluster system. Then, for all Y X, the cluster system H Y is also a parsimonious cluster system. Proof. Suppose that there exists Y X such that H Y = (Y, E Y ) is not a parsimonious cluster system. There exists then p > 1 for which there exist p clusters A i Y (1 i p) of H Y and p + 1 objects x i Y (1 i p + 1) for which for all 1 i p: x i, x i+1 A i and for all 1 < i < p: x j / A i for j = i 1, i + 2, x 3 / A 1 and x p 1 / A p and a cluster A p+1 containing x 1 and x p+1 but not x j for one 1 j p + 1. Since x 1,..., x p+1 Y, if A i / E (with 1 < i < p) there exists B i E where B i Y = A i such that x i, x i+1 B i and x j / B i for j = i 1, i + 2. Since the argument holds for A 1, A p and A p+1, one can form p + 1 clusters of H which violate the parsimonious cluster system definition: this is a contradiction. 10

Since parsimonious cluster systems are hypertrees, Theorem 4 shows that parsimonious cluster systems are part of the set of hypertrees that are stable under restriction. The converse is also true: Theorem 5. Let H = (X, E) be a hypertree such that for all Y X, the cluster system H Y = (Y, E Y ) is a hypertree. Then the cluster system H is a parsimonious cluster system. Proof. Suppose that H = (X, E) is not a parsimonious cluster system. There exist then p > 1 and p clusters A i X (1 i p) of H and p + 1 objects x i X (1 i p + 1) for which for all 1 i p: x i, x i+1 A i and for all 1 < i < p: x j / A i for j = i 1, i + 2, x 3 / A 1 and x p 1 / A p and a cluster A p+1 containing x 1 and x p+1 but not x j for one 1 < j < p + 1. Let then i 0 be the largest integer less than j such that x i0 A p+1 and i 1 the smallest integer larger than j such that x i1 A p+1. The restriction of H to {x i0,... x i1 } will then contradict either the first condition of Theorem 3 if i 0 = j 1 and i 1 = j + 1 or the second condition for the other cases. This leads to a contradiction. Theorems 4 and 5 lead to Theorem 6 which fully characterizes parsimonious cluster systems: Theorem 6. For a cluster system H = (X, E) the two following propositions are equivalent: H is a parsimonious cluster system, H Y is a hypertree for all Y X. Fig. 4 shows that not all the hypertrees are stable under restriction. The cluster system of Fig. 4 (a) is not a parsimonious cluster system (but a hypertree and a weak-hierarchical cluster system) because the restriction of this hypertree to {x 1,..., x 4 } induces the clusters {x 1, x 2 }, {x 2, x 3 }, {x 3, x 4 } and {x 4, x 1 }. Deleting one of the clusters of this hypertree (Fig. 4 (b)), the result is a parsimonious cluster system (no restriction can make appear any cycle anymore). This underlines the idea to use parsimonious cluster systems as an evolution model: the history does not change when considering the entire animal kingdom or a part of it. 11

x 2 x 2 x 1 x 3 x 1 x 3 x 4 (a) x 4 (b) Figure 4. (a) is not a parsimonious cluster system; (b) is a parsimonious cluster system with five elements, x 1, x 2, x 3, x 4 and the central element. 2.2 Metric related characterization Just as valued phylogenetic trees are called X-trees, we denote by X-hypertrees the valued parsimonious cluster systems. We will speak about X-hypertrees even if the base set of the associated parsimonious cluster system is not X. More formally: Definition 12. Let H = (X, E) be a parsimonious cluster system and f a function from E to the set of non-negative real numbers such that f(a) = 0 A = 1 and A B f(a) < f(b). The couple (H, f), or only H when f is obvious, is called a X-hypertree. We know (Diatta and Fichet, 1998) that for any weak-hierarchical cluster system H = (X, E) there exists a quasi-ultrametric on X whose dissimilarity clusters are exactly the subsets of X contained in E. Definition 13 (Diatta and Fichet, 1998). A dissimilarity d on X is a quasi-ultrametric if for all x, y, z, t X such that max{d(x, z), d(y, z)} d(x, y): d(z, t) max{d(x, y), d(x, t), d(y, t)} (3) This condition is called the four point inequality. Moreover, the dissimilarity clusters for a quasi-ultrametric q on X are exactly the elements B(x, y) = {z max{d(x, z), d(y, z)} d(x, y)} for x, y X. It is then quite easy to compute them all. 12

Since parsimonious cluster systems are weak-hierarchical cluster systems, one can associate a unique quasi-ultrametric d to any X-hypertree (H, f) such that f(a) = diam d (A) for any cluster A of H. Let then d be a quasi-ultrametric whose associated cluster system H is a parsimonious cluster system. It is clear that every threshold graph of d (definition 4) is chordal because a non-chordal cycle x 1... x p x 1 would lead to x i 1 / δ H (x i, x i+1 ), x i+1 / δ H (x i 1, x i ) and x i / δ H (x 1, x p ) for all 1 i p which contradicts the definition of an X-hypertree. Finally, a dissimilarity whose associated cluster system is a parsimonious one is a chordal quasi-ultrametric (Brucker, 2001). Definition 14. A dissimilarity d on X is a chordal quasi-ultrametric on X if: d is a quasi-ultrametric, every threshold graph of d is chordal. The converse is also true (Theorem 9) because the two following theorems 7 and 8 lead to the fact that the set of dissimilarity clusters of a chordal quasi-ultrametric is a hypertree stable under restriction, thus is a parsimonious cluster system. Theorem 7 (Brucker, 2001). The set of clusters of a chordal quasi-ultrametric is a hypertree. Theorem 8 (Brucker and Barthélemy, 2007). Let d be a quasi-ultrametric on X and Y X. The clusters of the restriction of d to Y are exactly the restriction to Y of the clusters of d. Theorem 9. For any X-hypertree (H, f) there exists an unique chordal quasi-ultrametric d such that its dissimilarity clusters coincide with the clusters of H and diam d (A) = f(a) for any cluster A of H. Conversely, for any chordal quasi-ultrametric d there exists an unique X-hypertree (H, f) such that its clusters coincide with the clusters of d and diam d (A) = f(a) for any dissimilarity cluster A of d. 13

One can finally use the results of Brucker (2001) who proved that tree distances are chordal quasi-ultrametrics, thus their dissimilarity clusters form a parsimonious cluster system (the converse is nevertheless wrong). Once again X-hypertrees meet X- trees. 2.3 Structure related characterization Let H = (X, E) be a cluster system. The couple (E {φ}, ) is then a lattice. Definition 15. A couple T = (E, ) is a lattice if is an order relation on set E and for any x, y E: there exists an unique element x y which is the largest element less than x and y, there exists an unique element x y which is the smallest element larger than x and y. We will here only speak about finite lattices (the set E of lattice T = (E, ) is finite), so lattice and finite lattice have to be considered as synonyms in this paper. Then there exist for each lattice T = (E, ) a smallest and a largest element in T which are denoted by 0 T and 1 T, respectively. For a lattice T = (E, ) with join and meet operators and, x E is said to be: join-irreducible if x = y z implies x = y or x = z, meet-irreducible if x = y z implies x = y or x = z, doubly irreducible if x is both join and meet irreducible. The lattices T = (E, ) where (X, E) is a cluster system are clearly in bijection with those lattices whose the join-irreducible elements are just its atoms in the following sense: Definition 16. x is a atom for a lattice T = (E, ) if y < x implies y = 0 T. Since we will here only speak about lattices whose join-irreducible elements are its atoms, we will assume that lattice and 14

finite lattice whose join-irreducible elements are its atoms are synonyms. One can then associate to any lattice T = (E, ) a cluster system H(T ) = (X, E) where X is equal to the atoms of T and E = {A(y) y E\{0 T }} where A(y) is equal to all the elements of X less than y in T. It is moreover clear that there is a one-to-one correspondence between (E {φ}, ) and (E, ). We will prove in this section that parsimonious cluster systems are in bijection with a well known lattice structure: dismantlable lattices. Definition 17 (Rival, 1974). A lattice T = (E, ) is dismantlable if there exists a doubly irreducible element x in E and the lattice T = (E\{x}, ) remains dismantlable. In order to prove the bijection, we will use the characterization of Theorem 10. Definition 18. Let T = (E, ) be a lattice. A crown is a partially ordered set {x 1, y 1, x 2, y 2,..., x n, y n } in which x i < y i, y i > x i+1 for 1 i n 1, x n < y n and x 1 < y n are the only comparability relations. Theorem 10 (Kelly and Rival, 1974). Dismantlable lattices are exactly lattices with no crown. Theorem 11. If T = (E, ) is a dismantlable lattice then H(T ) is a parsimonious cluster system and conversely if H = (X, E) is a parsimonious cluster system then (E {φ}, ) is a dismantlable lattice. Proof. Let T = (E, ) be a dismantlable lattice and suppose that H(T ) = (X, E) is not a parsimonious cluster system. As for proof of Theorem 5 there exist then p > 1 and p clusters A i X (1 i p) of H and p + 1 objects x i X (1 i p + 1) for which for all 1 i p: x i, x i+1 A i and for all 1 < i < p: x j / A i for j = i 1, i + 2, x 3 / A 1 and x p 1 / A p and a cluster A p+1 containing x 1 and x p+1 but not x j for one 1 j p + 1. Let then i 0 be the largest integer less than j such that x i0 A p+1 and i 1 the smallest integer larger than j such that x i1 A p+1. 15

One can then extract a crown of (E {φ}, ) from the ordered set {{x i0 }, δ H (x i0, x i0 +1), {x i0 +1}, δ H (x i0 +1, x i0 +2),..., {x i1 }, δ H (x i1, x i0 )}, which is a contradiction. Conversely, let H = (X, E) be a parsimonious cluster system and suppose that the lattice (E {φ}, ) admits a crown {A 1, B 1, A 2, B 2,..., A n, B n }. One can assume that n > 3 since otherwise H wouldn t be a weak-hierarchical cluster system. Let Y = X\(B n B 2 ). Since n > 3, {A 1 Y, B 1 Y, A 2 Y, B 2 Y,..., A n Y, B n Y } is a crown of (E Y {φ}, ) and B n Y B 2 Y = φ. The graph G = (E Y, E) where {A, B} E if and only if A B φ will admit a cycle (B 1 Y, B 2 Y,..., B n Y, B 1 Y ). Since {B 2 Y, B n Y } / E there exists in G a non-chordal cycle containing B n Y, B 1 Y and B 2 Y. According to Theorem 3, H Y is not a hypertree, thus by Theorem 6, H is not a parsimonious cluster system: contradiction. For instance, Fig. 5 shows the Hasse diagram of cluster systems from Fig. 4. There is a crown in Fig. 5 (a) (in bold) which explains why the cluster system of Fig. 5 (a) cannot be a parsimonious cluster system. x 1 x 2 x 3 x 4 x 1 x 2 x 3 x 4 (a) (b) Figure 5. (a) and (b) are Hasse diagrams of cluster systems from Fig. 4 (a) and (b) respectively. (a) admits a crown (in bold). 16

3 Using X-hypertrees 3.1 Computing a X-hypertree To compute a X-hypertree one can use an algorithm shown in Brucker (2001) which computes a chordal quasi-ultrametric from any given dissimilarity d on X in O( X 4 ) operations. For instance, the lower triangular matrix of Tab. 1 is the dissimilarity d used in Bandelt and Dress (1989). It represents the dissimilarities among six hominidæ. The upper triangular matrix of Tab. 1 is the dissimilarity q(d) obtained by the algorithm described in Brucker (2001). Table 1. Dissimilarity d between six Hominidæ(lower triangular matrix) and dissimilarity q(d) (upper triangular matrix). H. sapiens 0 0.19 0.18 0.24 0.36 0.51 P. paniscus 0.19 0 0.07 0.23 0.37 0.51 P. troglodytes 0.18 0.07 0 0.21 0.37 0.51 G. gorilla 0.24 0.23 0.21 0 0.38 0.54 P. pygmaeus 0.36 0.37 0.37 0.38 0 0.51 H. lar 0.52 0.56 0.51 0.54 0.51 0 Computing the dissimilarity clusters of a chordal quasi-ultrametric is an easy task since it is a quasi-ultrametric. The dissimilarity clusters of q(d) are listed in Tab. 2. 3.2 Drawing a X-hypertree In order to draw a parsimonious cluster system, several approaches are possible. The first one is to draw its Hasse diagram. For instance, Fig. 6 is the Hasse diagram of the clusters from Tab. 2. To avoid the non planarity of the Hasse diagram, one can represent the clusters in 3 dimensions by using the spacial dendrograms of Batbedat (1990). Such an approach leads to a 3D representation of the Hasse diagram using the fact that each cluster of a parsimonious cluster system is a connected part of a underlying vertex tree (because a parsimonious cluster systems are 17

Table 2. Clusters of the dissimilarity q(d) (upper triangular matrix in Tab. 1). # diameter elements 1 0 H. Lar 2 0 P. Pygmaeus 3 0 H. Sapiens 4 0 P. Troglodites 5 0 P. Paniscus 6 0 G. Gorilla 7 0.07 P. Paniscus; P. Troglodites 8 0.18 P. Troglodites; H. Sapiens 9 0.19 H. Sapiens; P. Paniscus; P. Troglodites 10 0.21 P. Troglodites; G. Gorilla 11 0.23 P. Paniscus; P. Troglodites; G. Gorilla 12 0.24 H. Sapiens; P. Paniscus; P. Troglodites; G. Gorilla 13 0.36 H. Sapiens; P. Pygmaeus 14 0.37 H. Sapiens; P. Paniscus; P. Troglodites; P. Pygmaeus 15 0.38 H. Sapiens; P. Paniscus; P. Troglodites; P. Pygmaeus; G. Gorilla 16 0.51 H. Sapiens; P. Paniscus; P. Troglodites; P. Pygmaeus; H. Lar 17 0.54 H. Sapiens; P. Paniscus; P. Troglodites; P. Pygmaeus; G. Gorilla; H. Lar hypertrees). Nevertheless for printing purpose, one has to project the 3D view onto a plane, loosing the benefits of this method. We will now show a third method of representing a parsimonious cluster system which divides the clusters in two parts, a hierarchical and an overlapping one. Fig. 7 shows the result for the clusters of Tab. 2. In order to construct these two parts we consider for a given X-hypertree ((X, E), f): for A E\{X} define succs(a) as the elements that cover A. That is: succs(a) = min {B A B, B E}, 18

16 17 13 14 15 12 11 9 10 8 7 H Lar P Pygmaeus H Sapiens P Troglodites P Paniscus G Gorilla Figure 6. The Hasse diagram of Clusters from Tab. 2. 17 17 16 16 13 14 15 13 14 15 12 11 12 11 9 10 9 10 8 8 7 7 H Lar P Pygmaeus H Sapiens P Troglodites P Paniscus G Gorilla H Lar P Pygmaeus H Sapiens P Troglodites P Paniscus G Gorilla Figure 7. Cluster-oriented representation of the clusters from Tab. 2 (left), highlighting the elements from cluster 14 (right). 19

for A 2 X denote by sup(a) the smallest element (with respect to the inclusion order) of E which includes (or equals) A. This element exists because E is closed under intersection and X E. The graph G H = (E, E H ) where AB is an edge of E H if and only if B = sup(c) where C is the union of all the elements of succs(a)) is a hierarchical tree rooted in X. This tree is the hierarchical part of Fig. 7 where each element A of E is at height f(a). The overlapping part consists in all the edges of the graph G O = (E, E O ) where AB is an edge of E O if AB / E H and B succs(a). For instance, cluster 14 in Fig. 7 contains all the dashed elements. The cluster 8 is not a singleton, it is composed of H. Sapiens and P. Troglodites, but it s a leaf of the hierarchical tree G H. Those elements are just like replacement of missed elements. Moreover, if the X-hypertree is a strong hierarchy, G O is an empty graph and the hierarchical tree is exactly G H. Considering Fig. 8, interval hypergraphs have both a hierarchical and an overlapping part. The graph G O can be interpreted either as a contamination of the strong hierarchy (elements are similar but in different ways, just like the example given in part 2) or resulting from different rates of evolution in a phylogenetic point of view. This representation can also be interpreted like the reticulograms proposed by Legendre and Makarenkov (2002). 3.3 Leaves of a parsimonious cluster system The above section gives a first way of interpreting a parsimonious cluster system: a contamination of a strong hierarchy by overlapping clusters. This interpretation can nevertheless be used for any cluster system. One particularity of parsimonious cluster systems is the existence of leaves. We introduced them informally in Section 2 as elements in a border. Theorem 12 will formally define them. Recall that a subset C of 2 X is a chain if and only if for A and B in C either A B or B A. 20

12 10 11 8 9 5 6 7 1 2 3 4 x y z t u v Figure 8. Cluster-oriented representation of the interval hypergraph from Fig. 2. Theorem 12. Let H = (X, E) be a parsimonious cluster system. There exists an x in X such that the set of all clusters of H containing x is a chain. Such an element is called a leaf. Proof. We will show it using the bijection of Theorem 11. Let H = (X, E) be a parsimonious cluster system and T = (E {φ}, ) its associated dismantlable lattice. We will prove the property by induction on X. For X = 2, the existence of leaf is clearly true. Suppose that for X = i any parsimonious cluster system on X admits a leaf, and let H = (X, E) be a parsimonious cluster system with X = i + 1. According to Therorem 11, the lattice T = (E {φ}, ) is dismantlable thus there exists a doubly irreducible atom in T. Let {x} (x X) be this atom. Since H X\{x} is also a parsimonious cluster system (Theorem 4) it satisfies the induction property and let y be a leaf of H X\{x}. Two cases may then occur. Either y is also a leaf of H and the property holds, or y is not a leaf of H. The second case implies that {x, y} is a cluster of H and since {x} is a doubly irreducible atom of T, any cluster A ( A > 1)of H containing x contains also y, thus A X\{x} is in the chain of clusters of H X\{x} containing y: x is a leaf of H, which concludes the proof. Since the restriction of a X-hypertree is also a X-hypertree it is clear that any parsimonious cluster system can be disembodied by iteratively removing a leaf (restrict the parsimonious cluster 21

system to all its elements but the leaf). Leaves of a X-hypertree play the same role as leaves for a tree (or a X-tree). Removing a leaf of a X-hypertree reveals another leaf. Moreover, using the same proof technic as for Theorem 12 one can prove: Theorem 13. Any parsimonious cluster system admits at least 2 leaves. For instance, the parsimonious cluster system of Tab. 2 contains 2 leaves: H. Lar and G. Gorilla. After removing these two leaves P. Pygmaeus and P. Paniscus becomes leaves of the remaining parsimonious cluster system and after removing them the two last elements, P. Troglodites and H. Sapiens, become leaves. Leaves of a parsimonious cluster system has a practical use since they form the border of the cluster system. This idea is stressed by the fact that the restriction of a parsimonious cluster system to the set of its leaves clearly forms a strong hierarchy. Removing the leaves of parsimonious cluster system (border of order 1) reveals an inner border (border of order 2), and so on. In a word, the complexity of the element is measured by the order of the border it belongs to. At last, since the clusters containing a leaf form a chain, one can compare every element to a leaf. The closer in the chain, the closer of the leaf. 4 Conclusions Parsimonious cluster systems and their valued version (X-hypertrees) are axiomatically designed to be a cluster system generalizing phylogenetic trees. They are in bijection with a dissimilarity model whose dissimilarity clusters can be easily computed. Moreover, a polynomial algorithm can approximate any given dissimilarity by a dissimilarity whose clusters form a parsimonious cluster system. Structurally, parsimonious cluster systems are in bijection with the well studied dismantlable lattice family. For a practical point of view, leaves of a parsimonious cluster system can be used like leaves in a tree (as a border containing the other 22

elements or using the chain of clusters to compare any element to them). This is ongoing reserach and a lot of work remains to be done for graphically represent those cluster systems. Either using a 3D program to represent them in a 3 dimensional space or by rearranging the clusters in order to provide a planar (or a almost planar ) representation. 5 References Bandelt, H.-J., and Dress, W. M. (1989), Weak hierarchies associated with similarity measures an additive clustering technique. Bulletin of Mathematical Biology, 51, 133 166. Batbedat, A. (1990), Les approches pyramidales dans la classification arborée, Masson: Paris. Barthélemy, J.-P. and Brucker, F. (2008), Binary clustering, Discrete Applied Mathematics, 156, 1237 1250. Buneman, P. (1971) The recovery of trees from measures of dissimilarity, In Kendall D. G., and Tautu P. (eds), Mathematics in Archaeological and Historical Sciences, Edinburgh University Press, 387 395. Brucker, F. (2001), Modèles de classification en classes empiétantes, PhD Thesis, EHESS: France. Brucker, F. (2005), From hypertrees to arboreal quasi-ultrametrics, Discrete Applied Mathematics, 147, 3 26. Brucker, F. (2006), Sub-dominant theory in numerical taxonomy, Discrete Applied Mathematics, 154, 1085 1099. Brucker, F., and Barthélemy, J.-P. (2007), Eléments de Classification, Hermes Publishing: London. Diatta, J. and Fichet, B. (1994), From Asprejan hierarchies and Bandelt-dress weak hierarchies to quasi-hierarchie, in Diday E., Lechevallier Y., Schader M., Bertrand P. (eds), New Approaches in classification and data analysis, Springer-Verlag, Berlin, 111 118. Diatta, J. and Fichet, B. (1998), Quasi-ultrametrics and their 2-balls hypergraph, Discrete Mathematics,192, 87 102. 23

Duchet, P. (1978), Propriétés de Helly et problèmes de représentations, in Problèmes combinatoires et théorie des graphes, Colloques internationaux du CNRS 260, 117-118. Felsenstein, J. (1983) Numerical Taxonomy, Springer-Verlag: Berlin. Flament, C. (1978), Hypergraphes arborés, Discrete Mathematics, 21, 223 227. Kelly, D. and Rival, I. (1974) Crowns, fences, and dismantlable lattices, Canad. J. Math., 26, 12571271. Legendre, P. and Makarenkov, V. (2002) Reconstruction of biogeographical and evolutionary networks using reticulograms, Systematic Biology, 51, 199 216. Lepouliquen, M. (2008) Filiation de manuscrits sanskrits par méthodes issues, pour partie, de la phylogénétique, Ph. D., EHESS: Paris. Rival, I. (1974) Lattices with doubly irreducible elements, Canadian Mathematical Bulletin, 17, 91 95. Semple, C., and Steel, M. (2003) Phylogenetics, Oxford University Press: New-York. Seitou, N. and Nei, M. (1987), The neighbor-joining method: a new method for recontruction of phylogenetic tree, Mol. Biol. Evol., 4, 406 425. Tversky, A. (1977) Features of similarity, Psychological Review, 84, 327 352. 24