A matroid associated with a phylogenetic tree

Similar documents
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Lassoing phylogenetic trees

Partial cubes: structures, characterizations, and constructions

THE MINIMALLY NON-IDEAL BINARY CLUTTERS WITH A TRIANGLE 1. INTRODUCTION

THE BALANCED DECOMPOSITION NUMBER AND VERTEX CONNECTIVITY

MATROID PACKING AND COVERING WITH CIRCUITS THROUGH AN ELEMENT

Determining a Binary Matroid from its Small Circuits

Upper Bounds for Stern s Diatomic Sequence and Related Sequences

Branchwidth of graphic matroids.

Isotropic matroids III: Connectivity

A MATROID EXTENSION RESULT

On the intersection of infinite matroids

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES

The Reduction of Graph Families Closed under Contraction

Parity Versions of 2-Connectedness

Probe interval graphs and probe unit interval graphs on superclasses of cographs

NOTE. A Note on Fisher's Inequality

Ring Sums, Bridges and Fundamental Sets

arxiv:math/ v1 [math.co] 20 Apr 2004

arxiv: v1 [math.co] 28 Oct 2016

Lecture 6 January 15, 2014

Tree sets. Reinhard Diestel

Characterizing binary matroids with no P 9 -minor

CONSTRUCTING A 3-TREE FOR A 3-CONNECTED MATROID. 1. Introduction

THE STRUCTURE OF 3-CONNECTED MATROIDS OF PATH WIDTH THREE

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

AN ALGORITHM FOR CONSTRUCTING A k-tree FOR A k-connected MATROID

ARCS IN FINITE PROJECTIVE SPACES. Basic objects and definitions

TOWARDS A SPLITTER THEOREM FOR INTERNALLY 4-CONNECTED BINARY MATROIDS III

Downloaded 03/01/17 to Redistribution subject to SIAM license or copyright; see

CHAPTER 5. Linear Operators, Span, Linear Independence, Basis Sets, and Dimension

Isolated Toughness and Existence of [a, b]-factors in Graphs

A short course on matching theory, ECNU Shanghai, July 2011.

Quivers. Virginia, Lonardo, Tiago and Eloy UNICAMP. 28 July 2006

TUTTE POLYNOMIALS OF GENERALIZED PARALLEL CONNECTIONS

Some Nordhaus-Gaddum-type Results

Algebraic Methods in Combinatorics

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

Blocks and 2-blocks of graph-like spaces

Math 581 Problem Set 9

TOWARDS A SPLITTER THEOREM FOR INTERNALLY 4-CONNECTED BINARY MATROIDS IV

Chordality and 2-Factors in Tough Graphs

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

Pseudo-automata for generalized regular expressions

Chapter 2 Metric Spaces

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

GEOMETRY FINAL CLAY SHONKWILER

The cocycle lattice of binary matroids

On the adjacency matrix of a block graph

Course 212: Academic Year Section 1: Metric Spaces

Compatible Circuit Decompositions of 4-Regular Graphs

Properties and Recent Applications in Spectral Graph Theory

Relation between pairs of representations of signed. binary matroids

1 Some loose ends from last time

An Introduction of Tutte Polynomial

Algebraic Methods in Combinatorics

k-blocks: a connectivity invariant for graphs

Diskrete Mathematik und Optimierung

Minimal Paths and Cycles in Set Systems

THE DIRECT SUM, UNION AND INTERSECTION OF POSET MATROIDS

Generalized Pigeonhole Properties of Graphs and Oriented Graphs

Topological properties

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach

DECOMPOSITIONS OF MULTIGRAPHS INTO PARTS WITH THE SAME SIZE

A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees

4 CONNECTED PROJECTIVE-PLANAR GRAPHS ARE HAMILTONIAN. Robin Thomas* Xingxing Yu**

Chordal Graphs, Interval Graphs, and wqo

arxiv: v1 [cs.dm] 12 Jun 2016

Groups of Prime Power Order with Derived Subgroup of Prime Order

ON CONTRACTING HYPERPLANE ELEMENTS FROM A 3-CONNECTED MATROID

Branching Bisimilarity with Explicit Divergence

Zaslavsky s Theorem. As presented by Eric Samansky May 11, 2002

Long non-crossing configurations in the plane

Structuring Unreliable Radio Networks

USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION

Chapter 2 Linear Transformations

ON SIZE, CIRCUMFERENCE AND CIRCUIT REMOVAL IN 3 CONNECTED MATROIDS

Claw-free Graphs. III. Sparse decomposition

Math 225B: Differential Geometry, Final

A NICE PROOF OF FARKAS LEMMA

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

Georgian Mathematical Journal 1(94), No. 4, ON THE INITIAL VALUE PROBLEM FOR FUNCTIONAL DIFFERENTIAL SYSTEMS

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Locating-Total Dominating Sets in Twin-Free Graphs: a Conjecture

CLOSURE OPERATIONS IN PHYLOGENETICS. Stefan Grünewald, Mike Steel and M. Shel Swenson

Analysis-3 lecture schemes

Decomposing planar cubic graphs

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Notes on Ordered Sets

TOWARDS A SPLITTER THEOREM FOR INTERNALLY 4-CONNECTED BINARY MATROIDS VII

Connectivity and tree structure in finite graphs arxiv: v5 [math.co] 1 Sep 2014

PERVERSE SHEAVES ON A TRIANGULATED SPACE

10.3 Matroids and approximation

arxiv: v1 [q-bio.pe] 1 Jun 2014

12.1 The Achromatic Number of a Graph

Submodular Functions, Optimization, and Applications to Machine Learning

Disjoint Hamiltonian Cycles in Bipartite Graphs

Root systems and optimal block designs

Unmixed Graphs that are Domains

Transcription:

Discrete Mathematics and Theoretical Computer Science DMTCS vol. 16:, 014, 41 56 A matroid associated with a phylogenetic tree Andreas W.M. Dress 1 Katharina T. Huer Mike Steel 3 1 CAS-MPG Partner Institute and Key La for Computational Biology, Shanghai, China School of Computing Sciences, University of East Anglia, Norwich, UK 3 Department of Mathematics and Statistics, University of Canterury, Christchurch, New Zealand received 19 th Dec. 013, revised 30 th May 014, accepted nd June 014. A (pseudo-metric D on a finite set is said to e a tree metric if there is a finite tree with leaf set and nonnegative edge weights so that, for all x, y, D(x, y is the path distance in the tree etween x and y. It is well known that not every metric is a tree metric. However, when some such tree exists, one can always find one whose interior edges have strictly positive edge weights and that has no vertices of degree, any such tree is up to canonical isomorphism uniquely determined y D, and one does not even need all of the distances in order to fully (re-construct the tree s edge weights in this case. Thus, it seems of some interest to investigate which susets of ( suffice to determine ( lasso these edge weights. In this paper, we use the results of a previous paper to discuss the structure of a matroid that can e associated with an (unweighted -tree T defined y the requirement that its ases are exactly the tight edge-weight lassos for T, i.e, the minimal susets of ( that lasso the edge weights of T. Keywords: phylogenetic tree, tree metric, matroid, lasso (for a tree, cord (of a lasso 1 Introduction Given any finite tree T without vertices of degree, there is an associated matroid M(T having ground set ( where is the set of leaves of T. In this paper, we descrie this matroid and investigate a numer of interesting properties it exhiits. The motivation for studying this matroid is its relevance to the prolem of uniquely reconstructing an edge-weighted tree from its topology and just some of the leaf-to-leaf distances in that tree. This cominatorial prolem arises in phylogenetics (the inference of evolutionary relationships from genetic data since due to patchy taxon coverage y availale genetic loci (Sanderson et al., 010 reliale estimates of evolutionary distances can often e otained only for some pairs of species. In (Dress et al., 011, we already introduced and explored related mathematical questions. We asked when knowing just some of the leaf-to-leaf distances is sufficient to uniquely determine or, as we say lasso the topology of the tree, or its edge weights, or oth (i. In this paper, we turn our attention to a This work was partly supported y NSFC grant 117155 Email: mike.steel@canterury.ac.nz (i Our term lasso, introduced in (Dress et al., 011, refers to the analogy of a cowoy catching an unruly east (in our case a tree with limited means. The term has no relationship to the use of the word in machine learning, or in linear regression. 1365 8050 c 014 Discrete Mathematics and Theoretical Computer Science (DMTCS, Nancy, France

4 Andreas W.M. Dress, Katharina T. Huer, Mike Steel fixed (un-weighted tree T and the set of minimal susets L of ( for which the leaf-to-leaf distances etween all x, y with {x, y} L relative to some edge-weighting ω of T suffice to determine all the other distances relative to ω and, thus, the edge-weighting ω. Indeed, these susets form the ases of the matroid M(T that will e studied here. We egin y recalling some asic definitions and some relevant terminology from (Dress et al., 011 on trees, lassos, and associated concepts (readers unfamiliar with asic matroid theory may wish to consult (Welsh, 1976 though even Wikipedia may suffice. We then define M(T and descrie some of its asic properties efore presenting our main results. Finally, we provide a numer of remarks, oservations, and questions for possile further study. Some terminology and asic facts We will assume throughout that is a finite set of cardinality n 3 and, for any two elements x, y, we will usually write just xy instead of {x, y}, and we will refer to any such set as a cord whenever x y holds. Throughout this paper, we will assume that T = (V, E is an -tree, i.e., a finite tree with vertex set V, leaf set V, and edge set E ( V that has no vertices of degree. Two -trees T 1 = (V 1, E 1 and T = (V, E are said to e equivalent if there exists a ijection ϕ : V 1 V with ϕ(x = x for all x and E coincides with { {ϕ(u, ϕ(v} : {u, v} E 1 } in which case we will also write T 1 T. In case every interior vertex of an -tree T (that is, every vertex in V has degree 3, T will also e said to e a inary -tree. Further, given any two vertices u, v of T, we denote y [u, v] the set of all vertices on the path in T from u to v and y E(u v = E T (u v the set of all edges e in E on that path so that [u, v] = e E(u v e always holds. For each e E, we denote y ω e the map E R : f δ ef (where δ is, of course, the Kronecker delta function. And for all e E and xy (, we put δe xy := 1 in case e E(x y and δ e xy := 0 otherwise. Here, given an -tree T = (V, E, we will e mainly concerned with the R-linear map from R E to R ( defined y: ω ω T, where ω T (xy = e E(x y ω(e = e E ω(e δ e xy, and the associated ( -laeled family of linear forms λ xy = λ T xy : R E R : ω ω T (xy ( xy (. Let D ω = D (T,ω denote the map from to R 0 associated to edge weighting ω and defined y (x, y D ω (x, y := e E(x y ω(e. Note that λ xy (ω e = δ e xy holds for all e E and all x, y, and that D ω (x, y = λ xy (ω = ω T (xy for all ω R E and x, y.

A matroid associated with a phylogenetic tree 43 When ω is a non-negative edge weighting, D ω is nothing ut the associated (pseudo-metric on induced y the edge weighted tree T = (T, ω and much studied in phylogenetic analysis. Recall also that, given an aritrary metric D : R 0 defined on, the metric D is dued a tree metric if it is of the form D (T,ω for some -tree T = (V, E and some non-negative edge weighting ω : E R 0 of T which, in turn, holds if and only if D satisfies the well-known four-point condition stating that, for all a,, c, d in, the larger two of the three distance sums D(a, + D(c, d, D(a, c + D(, d, D(a, d + D(, c coincide, that, in this case, one can actually always find an -tree T and an edge weighting ω of T with D = D (T,ω such that ω is strictly positive on all interior edges in which case ω is called a proper edge weighting of T, any such pair (T, ω is up to canonical isomorphism uniquely determined y D, and one does not even need to know the values of D for all cords xy in ( in order to determine all the other distances and, thus, the edge-weighting ω in this case. In this note, we continue our investigation of those susets L of ( for which given the -tree T already the restriction ω T L of the map ω T to L suffices to determine or lasso the edge weighting ω of T that we egan in (Dress et al., 011. To this end, we denote, for any suset L of (, y L = L T the R-linear suspace of the dual vector space R E := Hom R (R E, R of the space R E generated y the maps λ xy with xy L, y rk(l = rk T (L := dim R L the dimension of L, and y Γ(L := (, L the graph with vertex set and edge set L. Following the conventions introduced in (Dress et al., 011, we will refer to a suset L of ( as eing connected, disconnected or ipartite etc. whenever the graph Γ(L is connected, disconnected, or ipartite and so on, a connected component of Γ(L will also e called a connected component of L, and given any two susets A, B of, the suset {a : a A, B} of ( will e denoted y A B so that a suset L of ( is ipartite if and only if there exist two disjoint susets A, B of with L A B. Further, a suset L of ( will e called an edge-weight lasso for T if the implication ω1 T L = ω T L ω 1 = ω holds for any two proper edge-weightings ω 1, ω : E R >0 of T, a topological lasso for T if the implication ω T 1 L = ω T L T T holds for any -tree T and any proper edge-weightings ω 1 of T and ω of T, respectively, and a strong lasso for T if it is simultaneously an edge-weight and a topological lasso for T. Next, recall (see e.g. (Oxley, 011; Welsh, 1976 that an astract matroid M with a ground set, say, M can e defined in terms of its rank function rk M : P(M N 0 (and with P(M denoting the power set of M as well as y the collection I M = {I M : rk M (I = I } of its independent sets, the collection G M = {L M : rk M (L = rk M (M} of its generating sets, the collection B M of its ases,

44 Andreas W.M. Dress, Katharina T. Huer, Mike Steel i.e., the maximal sets in I M or, just as well, the minimal sets in G M, the collection C M of its circuits, i.e., the minimal sets in P(M I M, as well as the closure operator [... ] M : P(M P(M : L [L] M := {m M : rk M (L = rk M (L {m}} associated to M. Here, given any -tree T, we want to investigate the matroid M(T with ground set M := ( associated to T whose rank function rk M(T is the map rk T : P( ( N0 defined just aove, i.e., the matroid that is represented (over R, again see e.g. (Oxley, 011; Welsh, 1976 y the map λ T : ( R E : xy λ xy. We will denote y I(T := I M(T its collection of independent sets, y G(T := G M(T its collection of generating sets, y B(T := B M(T its collection of ases, y C(T := C M(T its collection of circuits and, given any suset L of (, we denote y [L] T := [L] M(T the (T -closure of L relative to M(T. An alternative description of M(T is as the matroid associated with the (0, 1 matrix A(T = [A (e,xy ], where the row index e ranges over the set E of edges of T, and the column index xy ranges over (, with: A (e,xy := { 1, if e E(x y; 0, otherwise. By (Dress et al., 011, Theorem 1, it follows that for any suset L of ( the following are equivalent. (a L is an edge-weight lasso for an -tree T = (V, E; ( the implication ω1 T L = ω T L ω 1 = ω holds not only for any two proper edge weightings ω 1, ω of T, ut for any two maps ω 1, ω R E ; (c L coincides with R E ; (d L G(T, or, equivalently, rk T (L = rk T ( ( = E holds. In ( particular, an edge-weight lasso L for T is a tight edge-weight lasso for T, i.e, a minimal suset of that is an edge-weight lasso for T, if and only if its cardinality coincides with E if and only if it is a asis of M(T, that is, L B(T holds. Particular types of -trees that will play an important role in this paper are shown in Figure 1. They comprise (i the star trees, i.e., -trees that have just one interior vertex and, hence, are equivalent to the tree T ( := ( V (, E ( with leaf set, vertex set V ( := { } and edge set E ( := { x : x } where denotes just some aritrary, ut fixed element not in ; (ii quartet trees, i.e., inary -trees that have four leaves (with T a cd denoting the quartet tree with leaf set {a,, c, d} whose central edge that will also e denoted y e a cd separates the leaves a, from c, d, and (iii caterpillar trees, i.e. inary -trees T = (V, E containing two interior vertices u, v V with [u, v] = V.

A matroid associated with a phylogenetic tree 45 a 1 a u a a 3 a a n e a cd v c d c d (i (ii a n 1 (iii a n Fig. 1: (i A star tree with leaf set 4 := {a,, c, d}; (ii A inary 4 -tree up to equivalence, there are two more inary 4 -trees ; (iii a caterpillar n -tree for n := {a 1, a,..., a n 1, a n }. 3 Star trees We first consider the simplest type of -tree, i.e., the star tree T := T ( with leaf set (cf. Figure 1 (i, for which the associated (0, 1 matrix A(T has just two nonzero entries in each column. In this case, the associated matroid B(T is well known: It is easily seen to exactly coincide with the iased matroid of the complete signed graph (, ( with vertex set all of whose edges have sign 1. In consequence (see e.g. (Oxley, 011, Section 6.10 and the references therein to Zaslavsky s papers on signed graphic matroids, the following results are known to hold: Proposition 3.1 Given a finite set of cardinality n 3, the following holds for the matroid B(T associated to the star tree T with leaf set : (i The collection G(T of all edge-weight lassos for T coincides with the collection of all strongly non-ipartite susets L of ( (, i.e, all susets L of for which none of the connected components of L is ipartite. (ii The collection B(T of all tight edge-weight lassos for T coincides with the collection of all minimal strongly non-ipartite susets L of ( (, i.e, all susets L of for which each connected component of L contains exactly one circle (ii and the length of this circle has odd parity. (ii In our context, we adopt the convention of calling a graph (and, hence, also every sugraph of a graph a circle if it is connected and every vertex in that graph has degree.

46 Andreas W.M. Dress, Katharina T. Huer, Mike Steel (iii The collection I(T of all independent susets of M(T coincides with the collection of all susets L of ( for which each connected component of L is either a tree or contains exactly one circle and the length of this circle has odd parity. (iv The collection C(T of all circuits of M(T coincides with the collection of all susets L of ( that either form a circle of even length or a pair of circles of odd length together with a connecting simple path, such that the two circles are either disjoint (then the connecting path has one end in common with each circle and is otherwise disjoint from oth or share just a single common vertex (in this case the connecting path is that single vertex. (v The co-rank n rk T (L of a suset L of ( relative to M(T coincides with the numer of ipartite connected component of L. (vi The closure [L] T of a suset L of ( relative to M(T coincides with the union of (a the edge set of the complete graph whose vertex set is the union of the vertex sets of all non-ipartite connected components of L and ( all susets of the form A B for which some ipartite connected component L of L with L A B, A B = xy L {x, y}, and A B = exists. 4 A recursive approach for computing B(T Every -tree can e reduced y a sequence of edge contractions to a star tree (one may even insist that at each stage, one of the two sutrees incident with the edge eing contracted has only one non-leaf vertex, though we do not require this here. Thus, Proposition 3.1 can e used as asis for a recursive description of the matroid associated with any -tree, provided that one can descrie, for any -tree T, how to otain B(T from B(T/f where f is any interior edge of T, and T/f is the -tree otained from T y collapsing edge f. We provide such a description shortly, in Proposition 4., using the following lemma the statement of which relies on further terminology which we introduce next. Suppose T = (V, E is an -tree and F is a suset of the set of interior edges of T. Then we denote y T/F the -tree otained y collapsing all edges in F. Furthermore, we denote y λ E F the restriction of a map λ R E to the space R E F relative to the canonical emedding R E F R E : ω ω (F 0 defined y extending each map ω R E F to the map ω (F 0 y putting ω (F 0 (e := 0 for all e F. Lemma 4.1 Given any -tree T = (V, E, any suset F of the set of interior edges of T, any map λ R E, and any map ρ : ( R with λ = xy ( ρ(xyλt xy. Then, λ E F = xy ( ρ(xyλ T/F xy, i.e., one has λ(ω = xy ( ρ(xyλt/f xy (ω E F for all maps ω R E with ω(f = 0 for all f F. In particular, if L is a edge-weight lasso for T, then L is also an edge-weight lasso for the -tree T/F. More generally, rk T (L = rk T/F (L + dim{λ L T : λ(e = 0 for all e E F } rk T/F (L + F (1 holds for every suset L of ( and any suset F of the set of interior edges of T.

A matroid associated with a phylogenetic tree 47 Proof: The first part follows directly from the definitions and implies that λ T/F xy = λ T xy E F holds for all xy (. In particular, as the map RE R E F : λ λ E F is surjective, the maps λ T/F xy (xy ( must generate R E F whenever the maps λ T xy (xy ( generate RE while, more generally, they generate a space whose dimension coincides with the difference of rk T (L and the dimension of the kernel of the map L T R E F : λ λ E F. Proposition 4. Given an -tree T, an interior edge f of T, a pair xy (, and a asis B of M(T/f, let ρ xy R B denote the unique map in R B with λ T/f xy = B ρ xy(λ T/f. Then, B(T coincides with the set { ( B f := {xy} B : xy, B B(T/f, and ρ xy ( δ f δ f xy }. B Proof: By Lemma 4.1, there exists, for each B B(T, some B with B B(T/f. Thus, each element of B(T is of the form B {xy} for some xy ( and some B B(T/f. So, B(T { {xy} B : xy ( } (, B B(T/f must clearly hold. Now suppose that xy and B B(T/f holds. Then, denoting y Λ B,xy the space of all maps λ {xy} B T with λ E {f} = 0, it follows from (1 that {xy} B B(T dim Λ B,xy = 1 holds while, y construction, we have dim Λ B,xy = 1 if and only if there exists some non-zero map in Λ B,xy. However, given any map ρ R B and any real numer c, it follows from the fact that, y definition, λ T zz E {f} coincides with λ T/f zz for all zz (, one has λc,ρ := cλ T xy + B ρ(λt Λ B,xy if and only if cλ T/f xy + B ρ(λt/f vanishes and, hence, if and only if ρ( = cρ xy ( holds for all B. Thus, one has {xy} B B(T if and only if one has λ c,ρ (ω f 0 and, hence, if and only if B ρ xy( δ f δ f xy holds as claimed. Remark: Suppose that T = (V, E is an -tree and that U V is a T core as defined in (Dress et al., 011, Section 5, i.e., a non-empty suset of V for which the induced sugraph T U := (U, E U := {e E : e U} of T with vertex set U is connected (and, hence, a tree and the degree deg TU (v of any vertex v in T U is either 1 or coincides with the degree deg T (v of v in T. Then, the rank rk T (L of a suset L of ( relative to T and the rank rk T U (L U of the corresponding suset L U of ( U relative to the U -tree T U are easily seen to e related y the inequality rk T (L rk T U (L U + E E U. This fact can e used to prove (Dress et al., 011, Theorem 5 in the same way as Lemma 4.1 has een used aove to estalish Proposition 4.. 4.1 An example To illustrate Proposition 4., consider for := {a,, c, d} the quartet -tree T := T a cd shown in Figure 1 (ii. In this case, there is up to scaling only one linear relation etween the six maps λ T xy (xy (, viz. the relation λ T ac + λ T d = λ T ad + λ T c : E R R : ω ω(e a cd + e E,e e a cd ω(e.

48 Andreas W.M. Dress, Katharina T. Huer, Mike Steel Thus, B(T consists of the four 5-susets L of ( that do not contain exactly one of the four cords ac, ad, c, d or, equivalently, with L {ac, ad, c, d} = 3 and, hence, the four susets L of ( whose ( graphs Γ(L are shown in Figure (ii. Clearly, if f coincides with the unique interior edge of T a cd i.e., the edge denoted y ea cd in Figure 1 (ii, T/f is equivalent to the star tree T := T (, also shown in Figure 1 (i. Furthermore, the graphs Γ(L corresponding to the ases L in B(T, eing minimal strongly non-ipartite graphs with vertex set, must consist of one triangle (for which there are 4 possiilities to which the remaining element in is appended y a single edge (for which there are 3 possiilities. So, B(T consists of 1 ases that form two orits relative to the symmetry group of T representatives of which are the ases B 1 := {a, c, ca, da} and B := {a, c, ca, dc} shown in Figure (i. For the two cords d, dc ( B1, we have putting f := e a cd λ T/f d = λ T/f da λt/f ac + λ T/f c and λ T/f dc = λ T/f da λt/f a + λ T/f c while and λ T d(ω f = λ T da(ω f λ T ac(ω f + λ T c(ω f = 1 λ T dc(ω f = 0 λ T da(ω f λ T a(ω f + λ T c(ω f = holds implying that, to ases of type B 1, we can add cords of type dc, ut not cords of type d. And for the two cords da, d ( B, we have λ T/f da = λ T/f dc λ T/f c + λ T/f a and λ T/f d = λ T/f dc λ T/f ca + λ T/f a as well as and 1 = λ T da(ω f λ T dc(ω f λ T c(ω f + λ T a(ω f = 1 1 = λ T d(ω f λ T dc(ω f λ T ca(ω f + λ T a(ω f = 1 holds implying that, to ases of type B, we can add either one of the two missing cords. Oviously, this fully corroorates our previous assertion aout B(T a cd. Our next results require two definitions that will also e important later in this paper: Recall first that, given an -tree T and a suset Y of of cardinality at least 3, one denotes y T Y the Y -tree otained from the minimal sutree of T that connects the leaves in Y y suppressing any resulting vertices of degree (see e.g. (Dress et al., 011, Section.3, y V Y and E Y its vertex and edge set, respectively, and, given in addition any edge weighting ω of T, one denotes y ω Y the induced edge weighting of T Y, i.e., the edge weighting that maps any edge {u, v} E Y onto the sum e E(u v ω(e, yielding a surjective R-linear map res Y : R E R E Y : ω ω Y such that λ T yy (ω = ωt (yy = (ω Y T Y (yy = λ T ( Y yy resy (ω holds for all ω R E and all yy ( Y.

A matroid associated with a phylogenetic tree 49 c Γ(B 1 : a a a d d c d c a Γ(B : a a c d d c d c (i (ii Fig. : (i: Two graphs representing two of the twelve tight edge-weight lassos of T ({a,, c, d}, one from each of the two orits of such lassos relative to the 8-element symmetry group of T a cd. (ii: The four graphs associated to the four ases in B(T a cd. It follows that the map λ T yy : RE R coincides, for all yy ( Y T, with the map λ Y yy res Y, the composition of the maps res Y and λ T Y yy. So, denoting y res Y the dual and necessarily injective map R E Y RE : λ λ res Y of the map res Y, we have also L T = res Y ( L T Y and rk T (L = rk T Y (L for every suset L of ( Y. In consequence, we must also have rk T (L = rk T Y (L ( for every suset Y of of cardinality at least 3 and every suset L of ( Y. Consequently, every circuit L ( Y of M(T Y must also e a circuit of M(T, i.e., we have C(T Y C(T for every such suset Y of of cardinality at least 3. Further, denoting for every x y e x E the unique pendant edge of T containing x, we say that a cord a of forms (or is a T cherry if the two edges e a, e E share a vertex, and a is said to form a proper T cherry if this vertex has degree 3. Note that a cord a of forms a proper T cherry if and only if T {a,,c,d} T a cd holds for any two distinct elements c, d in {a, } (if any. Note also that, in a inary -tree T, every T cherry is a proper T cherry. In addition, such a tree is a caterpillar tree if and only if = 3 holds or its leaf set contains exactly two distinct T cherries. We claim: Proposition 4.3 For an -tree T, a cord a ( is a co-loop of M(T, i.e. it is contained in every edge-weight lasso for T, if and only if a is a proper T cherry.

50 Andreas W.M. Dress, Katharina T. Huer, Mike Steel Proof: If = 3 holds, the set ( is the only asis of M(T while, if n 4 holds and a is a proper T cherry, the cord a must e contained in every edge-weight lasso for T in view of (Dress et al., 011, Corollary 1. Conversely, if a does not form a proper T cherry, there must exist two distinct elements c, d in {a, } such that λ T a + λt cd = λt ac + λ T d holds. This implies that λt a cannot e a co-loop of M(T. 5 Main results 5.1 M(T determines T up to equivalence We egin this section y showing that the matroid associated with an -tree determines that -tree, up to equivalence: Theorem 1 One has M(T 1 = M(T T 1 T for any two -trees T 1 and T. Proof: We first note that, if T is any -tree and Y = {a,, c, d} is a 4-suset of, then we have T Y T a cd if and only if there exists at least one asis B of M(T containing the set L acd := {a, c, cd, da}: Indeed, if T Y T a cd holds, the four maps λ T Y xy (xy L acd and, hence, also the corresponding four maps λ T xy (xy L acd are linearly independent. So, y the matroid augmentation property of independent sets, there exists some B M(T containing these four cords. Conversely, if T Y T a cd and, therefore, also λ T a + λt cd = λt ad + λt c holds, L acd cannot e part of a asis B M(T. It follows that M(T 1 M(T implies Q(T 1 = Q(T where, for any -tree T, Q(T is defined y Q(T := {a cd : {a,, c, d} ( 4, T {a,,c,d} T a cd }. However, it has een oserved already y H. Colonius and H. Schultze in (Colonius and Schultze, 1977, 1981 that Q(T 1 = Q(T holds for any two -trees T 1, T if and only if one has T 1 T (for a more recent account, see (Dress et al., 011a, Theorem.7 or (Semple and Steel, 003, Corollary 6.3.8. 5. The rank of topological lassos Now assume that n 4 holds and recall that the following three assertions are according to (Dress et al., 011, Theorem 8 equivalent in this case for any -tree T = (V, E and any ipartition of into two disjoint non-empty susets A, B: (split-i The suset A B of ( is a topological lasso for T, (split-ii A B is a t cover of T (i.e., given any interior vertex v of T and any two edges e, e E with v e, e, there exists some cord in A B with e, e E(x y, see (Dress et al., 011a, Section 7. (split-iii A {a, } B {a, } holds for every T cherry a. (iii And it was also noted in this context that such ipartitions exist if and only if every T cherry is a proper T cherry and n 4 holds. Here, we want to complement this result as follows: (iii When stating this theorem in (Dress et al., 011, we omitted to mention that one needs to assume that n 4 holds. Indeed, it is simply wrong for n = 3 for trivial reasons as (split-i holds for all ipartitions A, B of the leaf set of an -tree with 3 leaves, ut (split-ii and (split-iii never holds in this case. Yet, the assumption n 4 will always e made here when applying this theorem.

A matroid associated with a phylogenetic tree 51 Theorem Given any -tree T = (V, E, one has rk T (L E 1 for every ipartite suset L of ( (. Furthermore, the following assertions are equivalent for every suset L of that satisfies rk T (L E 1: (i The rank rk T (L of L coincides with E 1. (ii There exists some cord xy ( such that L {xy} is an edge-weight lasso for T. (iii L is connected and L {xy} is an edge-weight lasso for T for some cord xy ( if and only if L {xy} is not ipartite. (iv L is connected, the closure [L] T of L relative to M(T coincides with the edge set of the (necessarily unique complete ipartite graph with vertex set whose edge set contains L, i.e., [L] T coincides with the set A B in case the two susets A, B of form the (necessarily unique ipartition of with L A B, and this set forms a hyperplane in M(T, i.e., a maximal suset of ( of rank smaller than E. Proof: Assume that A and B are two susets of that form a ipartition of with L A B, and let ω A B R E denote the map in R E that maps every interior edge of T onto 0, every pendant edge e that is incident with some leaf in A onto 1, and every pendant edge e that is incident with some leaf in B onto 1. Clearly, λ T xy(ω A B = + if x, y A, if x, y B, 0 otherwise, holds for every cord xy (. In particular, one has λ T xy (ω A B = 0 for some cord xy ( if and only if xy A B holds. So, standard matroid theory implies that rk T (L rk T (A B E 1 must hold for every suset L of a set of the form A B for some ipartition A, B of, that is, for every ipartite suset L of ( which is just our first assertion. (i (ii: It follows also from standard matroid theory that the rank rk T (L of an aritrary nonempty suset L of ( with rk T (L E 1 coincides with E 1 if and only if there exists some cord xy ( such that L {xy} is an edge-weight lasso for T. (i (iii: And standard matroid theory implies also that, if L is any suset of (, one has rk T (L = E 1 if and only if there exists up to scaling exactly one non-zero map ω R E with λ T a (ω = 0 for all a L and that, in this case, L {xy} is an edge-weight lasso for T for some cord xy ( if and only if λ T xy(ω 0 holds for this map ω. In consequence, if L ( is ipartite and rk T (L = E 1 holds, our oservations aove imply that there must e a unique ipartition A, B of with L A B and that, given any cord xy (, the union L {xy} is an edge-weight lasso for T if and only if λ T xy(ω A B 0 and, hence, if and only if xy A B holds. Furthermore, the fact that there is only one ipartition A, B of with L A B implies that L must e connected and that, in consequence, xy A B holds if and only if L {xy} is not ipartite. So, we see that L {xy} is indeed an edge-weight lasso for T if and only if L {xy} is not ipartite, as claimed. (3

5 Andreas W.M. Dress, Katharina T. Huer, Mike Steel (iii (ii: This is trivial as any connected graph with at least three edges can e extended y a single edge to ecome a non-ipartite graph. (i,ii,iii (iv: If L is connected and ipartite, there exists exactly one ipartition A, B of with L A B and L {xy} ipartite for some cord xy ( if and only if xy A B holds. So, if in addition also E 1 = rk T (L holds, [L] T must e a hyperplane and we must have E 1 = rk T (L rk T (A B E 1. Thus, rk T (L = rk T (A B = E 1 as well as ( [L] T = {xy : rk T (L {xy} E 1} ( = {xy : L {xy} is ipartite} = A B, as claimed. (iv (i: This is trivial in view of the fact that rk T (L = rk T ([L] T holds for every nonempty suset L of ( and the fact that any maximal suset of ( of rank smaller than E must have rank E 1. The aove theorem has an interesting application regarding topological lassos: Theorem 3 (i Given any -tree T = (V, E and any ipartite suset L of ( with rk T (L = E 1, the hyperplane [L] T is a topological lasso for T if and only if every T cherry is a proper T cherry (i.e., if and only if there exists at least one ipartition A, B of such that A B is a topological lasso for T. In this case [L] T {xy} must e a strong lasso for T for every cord xy ( for which (, L {xy} is not ipartite, that is, with xy [L] T. (ii Conversely, if L is any topological lasso for T with rk T (L < E, then L must e ipartite, rk T (L = E 1 must hold, every T cherry must e a proper T cherry, and L {xy} and not only [L] T {xy} must e a strong lasso for T for every cord xy ( for which L {xy} is not ipartite. Proof: (i: In view of the equivalence (i (iv of Theorem, there must exist a (necessarily unique ipartition of into two disjoint susets A and B such that the hyperplane [L] T coincides with the set A B. Furthermore, this set must also e a topological lasso for T if every T cherry is a proper T cherry: Indeed, in view of the results from (Dress et al., 011 quoted aove, it suffices to show that, if a is a proper T cherry and a A holds, we must have B. Yet, otherwise, we would have A and, therefore, a L. Since n 4, this would allow us to construct yet another non-zero map ω a R E with λ T xy(ω a = 0 for all cords xy L that is not a scalar multiple of ω A B : Indeed, if the two edges e a, e E containing a and, respectively, share the vertex v, there would exist exactly one further edge e a E with v e a, and putting ω a (e a = ω a (e := 1, ω a (e a := 1, and ω a (e := 0 for all other edges e E would yield such a map ω a, as required. So, [L] T = A B must indeed e a topological lasso for T, as claimed. For the only if direction, assume that [L] T is a topological lasso for T. By Theorem ((i (iv we have [L] T = A B, and from the implication (split-i (split-iii (near the start of Section 5. T cannot contain three leaves all of which are adjacent to a common vertex. Thus every T cherry must e a proper T cherry. (ii Assume that L is a topological lasso for T of rank less than E. Then, there must exist a non-zero map ω 0 R E with λ xy (ω 0 = 0 for all xy L. If ω 0 (e 0 0 held for some interior edge e 0 E, we could find some proper edge-weighting ω R E 0 of T with ω(e 0 = ω 0 (e 0, while ω(e > ω 0 (e holds for all edges e E {e 0 }. This, in turn, would imply that the map ω := ω sgn ( ω 0 (e 0 ω 0 is a

A matroid associated with a phylogenetic tree 53 map in R E 0 that is a non-proper edge-weighting of T for which D (T,ω L = D (T,ω L holds. In view of the last remark in (Dress et al., 011, Susection., this would contradict our assumption that L is a topological lasso for T. Consequently, the support supp(ω 0 := {e E : ω 0 (e 0} of ω 0 must e contained in the set {e x : x } of pendant edges of T, implying that λ xy (ω 0 = ω 0 (e x + ω 0 (e y must hold for every cord xy (. Thus, putting A := {x : ω0 (e x > 0} and B := {y : ω 0 (e y < 0}, we see that x A y B must hold for every cord xy ( with λxy (ω 0 = 0 and, hence, for all xy L. Thus, as L must e connected for every topological lasso L for T in view of (Dress et al., 011, Theorem 4, it follows that the pair A, B of susets of forms a ipartition of, that L must e ipartite relative to this partition, and that A B must also e a topological lasso for T. Consequently, every T cherry must e a proper T cherry and ω 0 must e a (positive scalar multiple of the map ω A B R E defined aove. In particular, there can e up to a scaling only one non-zero map ω R E with λ xy (ω = 0 for all xy L implying that the rank of L must indeed coincide with E 1 and that L {xy} must, therefore, e a strong lasso for T for every cord xy ( for which L {xy} is not ipartite, i.e. for every cord xy ( A B. Corollary 5.1 Given any -tree T = (V, E, the following assertions are equivalent: (i There exists a ipartite suset L of ( that is a topological lasso for T. (ii There exists a topological lasso L for T with rk T (L < E. (ii There exists a topological lasso L for T with rk T (L = E 1. (iii Every T cherry is a proper T cherry. A simple example to illustrate Corollary 5.1 is presented in Figure 3: 6 Concluding comments 6.1 -trees T for which M(T is a non-inary matroid Let us note finally that the matroid M(T associated to an -tree T is a non-inary matroid whenever there exist three disjoint cords in each of which forms a T cherry: Indeed, assume that x 1, x, x 3, x 4, x 5, x 6 are six distinct elements in such that each of the three cords x i x i+3 (i = 1,, 3 forms a T cherry. It is then easy to check that the two susets L 1 := {x 1 x, x x 3, x 3 x 4, x 4 x 5, x 5 x 6, x 6 x 1 } and L := {x 1 x 3, x 3 x 4, x 4 x 6, x 6 x 1 } of ( are circuits in M(T while their symmetric difference L := L 1 L = { x 1 x, x x 3, x 3 x 1, x 4 x 5, x 5 x 6, x 6 x 4 } is an independent suset of ( in M(T. Clearly, this implies that a inary -tree T for which M(T is a inary matroid must e a caterpillar tree. More generally, an aritrary -tree T for which M(T is a inary matroid must e either a star tree with at most five leaves or an -tree for which as in the case of the (inary caterpillar trees two interior vertices u and v of T exist for which the path from u to v in T passes every interior vertex of T and all of these except perhaps u and v have degree 3 while the two vertices u and v have degree 3 or 4. We will show in a separate paper that, conversely, M(T is a inary matroid whenever this holds.

54 Andreas W.M. Dress, Katharina T. Huer, Mike Steel a T : Γ(L : e a d c c d f f e (i (ii Fig. 3: (i An -tree T = (V, E for := {a,, c, d, e, f} with only proper T cherries, and (ii a ipartite topological lasso L with rk T (L = 8(= E 1. Note that oth T cherries a and ef have a non-empty intersection with oth parts {a, d, f} and {, c, e} of the ipartition of induced y L. 6. Minimal strong lassos do not form a matroid Although the minimal edge-weight lassos for any -tree form a matroid defined on (, the same is not always true for the minimal strong lassos. To see this, let = {a,, c, d, e, f} and consider the sets: L 1 := { a, ac, ad, c, d, cd, ef, ae, e, ce, de, df }, and L := { a, ac, ad, c, d, cd, ef, ae, e, cf, df }. Both L 1 and L are minimal strong lassos for the -tree that has one interior vertex adjacent to a,, c, d, and a second interior vertex adjacent to e, f; however, L 1 has one more element than L. 6.3 Pointed x-covers of inary -trees T that are ases of M(T When T is a inary -tree, some particular ases in B(T are easily descried: Select any element x and, for each one of the n interior vertices v of T, consider the three components of the graph otained from T y deleting v. Select an element of from each of the two components that do not contain x, and denote this pair y y v = y v (x, z v = z v (x. Put L = {ax : a {x}} {y v z v : v V }, and let P x (T denote the collection of susets of ( that can e generated in this way (y the various choices of y v and z v as v varies.

A matroid associated with a phylogenetic tree 55 For example, considering again the quartet -tree T := T a cd with its two interior vertices u and v as shown in Figure 1 (ii, we may choose x := d, y u = y v := a, z u :=, z v = c and otain the lasso L = {ad, d, cd} {a, ac} as an element of P d (T a cd. Clearly, P x (T is a suset of B(T for each x, since the elements of P x (T correspond precisely to the so-called pointed x covers of T of cardinality n 3 and, y Theorem 7 of (Dress et al., 011, any pointed x cover L of a inary -tree is not only an edge-weight, ut a strong lasso for that tree. We note also that, given two distinct elements x 1, x in, a suset L of ( cannot simultaneously e a pointed x 1 -cover in P x1 (T and a pointed x -cover in P x (T unless T is a caterpillar tree with x 1 and x at opposite ends of the tree: Indeed, if there exists some L P x1 (T P x (T, we must have {y v (x 1 z v (x 1 : v V } = { x a : a {x 1, x } } implying that the path from x 1 to x in T must pass through every interior vertex of T. Acknowledgements We thank Geoffrey Whittle for helpful comments and references to some relevant matroid-theory literature. We also thank the two annonymous reviewers. A.D. thanks the CAS and the MPG for financial support; M.S. thanks the Royal Society of New Zealand under its Marsden and James Cook Fellowship scheme. References H. Colonius and H. H. Schultze. Trees constructed from empirical relations. Braunschweiger Berichte aus dem Institut für Psychologie 1, 1977. H. Colonius and H. H. Schultze. Tree structure from proximity data. Brit. J. Math. Stat. Psych., 34: 167 180, 1981. A. Dress, K. T. Huer, J. Koolen, V. Moulton, and A. Spillner. Basic phylogenetic cominatorics. Camridge University Press, 011a. A. Dress, K. T. Huer, and M. Steel. Lassoing a phylogenetic tree I: Basic properties, shellings and covers. J. Math. Biol., 65(1:77 105, 011. A. Guénoche, B. Leclerc, and V. Markarenkov. On the extension a partial metric to a tree metric. Discr. Appl. Math., 76:9 48, 004. J. Oxley. Matroid Theory (nd edition. Oxford University Press, USA, 011. M. J. Sanderson, M. M. McMahon, and M. Steel. Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol. Biol., 10:155, 010. C. Semple and M. Steel. Phylogenetics. Oxford University Press, 003. D. A. Welsh. Matroid theory. Academic Press, New York, 1976.

56 Andreas W.M. Dress, Katharina T. Huer, Mike Steel