The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models

Size: px
Start display at page:

Download "The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models"

Transcription

1 The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models Gabriel Cardona, Arnau Mir, Francesc Rosselló Department of Mathematics and Computer Science, University of the Balearic Islands, E-07 Palma de Mallorca, Spain arxiv:30.53v [q-bio.pe] Jan 03 Abstract The cophenetic metrics d ϕ,p, for p {0} [, [, are a recent addition to the kit of available distances for the comparison of phylogenetic trees. Based on a fifty years old idea of Sokal and Rohlf, these metrics compare phylogenetic trees on a same set of taxa by encoding them by means of their vectors of cophenetic values of pairs of taxa and depths of single taxa, and then computing the L p norm of the difference of the corresponding vectors. In this paper we compute the expected value of the square of d ϕ, on the space of fully resolved rooted phylogenetic trees with n leaves, under the Yule and the uniform probability distributions. Keywords: Phylogenetic tree, Cophenetic metric, Uniform model, Yule model, Sackin index, Total cophenetic index. Introduction The definition and study of metrics for the comparison of rooted phylogenetic trees on the same set of taxa is a classical problem in phylogenetics [0, Ch. 30], and many metrics have been introduced so far with this purpose. A recent addition to the set of metrics available in this context are the cophenetic metrics d ϕ,p introduced in [8]. Based on a fifty years old idea of Sokal and Rohlf, these metrics compare phylogenetic trees on a same set of taxa by first encoding the trees by means of their vectors of cophenetic values of pairs of taxa and depths of single taxa, and then computing the L p norm of the difference of the corresponding vectors. Once the disimilarity between two phylogenetic trees has been computed through a given metric, it is convenient in many situations to assess its signifiance. One possibility is to compare the value obtained with its expected, or Corresponding author addresses: gabriel.cardona@uib.es (Gabriel Cardona), arnau.mir@uib.es (Arnau Mir), cesc.rossello@uib.es (Francesc Rosselló) Preprint submitted to Elsevier January 3, 03

2 mean, value: is it much larger, much smaller, similar? [8] This makes it necessary to study the distribution of the metric, or, at least, to have a formula for the expected value of the metric for any number n of leaves. The distribution of several metrics has been studied so far: see, for instance, [5, 6, 6, 7, 8]. The expected value of a distance depends on the probability distribution on the space of phylogenetic trees under consideration. The most popular distribution on the space T n of binary phylogenetic trees with n leaves is the uniform distribution, under which all trees in T n are equiprobable. But phylogeneticists consider also other probability distributions on T n, defined through stochastic models of evolution [0, Ch. 33]. The most popular is the so-called Yule model [4, 9], defined by an evolutionary process where, at each step, each currently extant species can give rise, with the same probability, to two new species. Under this model, different phylogenetic trees with the same number of leaves may have different probabilities, which depend on their shape. In this paper we provide explicit formulas for the expected values under the uniform and the Yule models of the square of the euclidean cophenetic metric d ϕ,. The proofs of these formulas are based on long and tedious algebraic computations and thus, to ease the task of the reader interested only in the formulas and the path leading to them, but not in the details, we have moved these computations to an Appendix at the end of the paper. Besides the aforemenentioned application of this value in the assessment of tree comparisons, the knowledge of formulas for the expected value of d ϕ, under different models may allow the use of d ϕ, to test stochastic models of tree growth, a popular line of research in the last years which so far has been mostly based on shape indices; see, for instance, [3, 9]. As a proof of concept, in 4 we report on a basic, preliminary such test performed on the binary phylogenetic trees contained in the TreeBASE database [0].. Preliminaries In this paper, by a phylogenetic tree on a set S of taxa we mean a fully resolved, or binary, rooted tree with its leaves bijectively labeled in S. We understand such a rooted tree as a directed graph, with its arcs pointing away from the root. To simplify the language, we shall always identify a leaf of a phylogenetic tree with its label. We shall also use the term phylogenetic tree with n leaves to refer to a phylogenetic tree on the set {,..., n}. We shall denote by T (S) the space of all phylogenetic trees on S and by T n the space of all phylogenetic trees with n leaves. Let T be a phylogenetic tree. If there exists a directed path from u to v in T, we shall say that v is a descendant of u and also that u is an ancestor of v. The lowest common ancestor LCA T (u, v) of a pair of nodes u, v in T is the unique common ancestor of them that is a descendant of every other common ancestor of them. The depth δ T (v) of a node v in T is the distance (in number of arcs) from the root of T to v. The cophenetic value ϕ T (i, j) of a pair of leaves i, j in T is the depth of their LCA. To simplify the notations, we shall often write ϕ T (i, i) to denote the depth δ T (i) of a leaf i.

3 Given two phylogenetic trees T, T on disjoint sets of taxa S, S, respectively, we shall denote by T T the phylogenetic tree on S S obtained by connecting the roots of T and T to a (new) common root. Every phylogenetic tree T T n is obtained as T k T n k, for some k n, some subset S k {,..., n} with k elements, some tree T k on S k and some tree T n k on Sc k {,..., n}\s k. Actually, every phylogenetic tree in T n is obtained in this way twice. The Yule, or Equal-Rate Markov, model of evolution [4, 9] is a stochastic model of phylogenetic trees growth. It starts with a node, and at every step a leaf is chosen randomly and uniformly and it is splitted into two leaves. Finally, the labels are assigned randomly and uniformly to the leaves once the desired number of leaves is reached. This corresponds to a model of evolution where, at each step, each currently extant species can give rise, with the same probability, to two new species. Under this stochastic model, if T T n is a phylogenetic tree with set of internal nodes V int (T ), and if for every v V int (T ) we denote by l T (v) the number of its descendant leaves, then the probability of T is [4, 7] P Y (T ) n! v V int(t ) l T (v). The uniform, or Proportional to Distinguishable Arrangements, model [] is another stochastic model of phylogenetic trees growth. Unlike the Yule model, its main feature is that all phylogenetic trees T T n have the same probability: P U (T ), where (n 3)!! (n 3)(n 5) 3. (n 3)!! From the point of view of tree growth, this model is described as the process that starts with a node labeled and then, at the k-th step, a new pendant arc, ending in the leaf labeled k +, is added either to a new root (whose other child will be, then, the original root) or to some edge, with all possible locations of this new pendant arc being equiprobable [9, 6]. Although this is not an explicit model of evolution, only of tree growth, several interpretations of it in terms of evolutionary processes have been given in the literature: see [3, p. 686] and the references therein. 3. Main results Let T T n be a phylogenetic tree with n leaves. The cophenetic vector of T is ϕ(t ) ( ϕ T (i, j) ) i j n Rn(n+)/, with its elements lexicographically ordered in (i, j). It turns out [8] that the mapping ϕ : T n R n(n+)/ sending each T T n to its cophenetic vector ϕ(t ), is injective up to isomorphism. As it is well known, this allows to induce metrics on T n from metrics defined on powers of R. In particular, in this paper 3

4 we consider the cophenetic metric d ϕ, on T n induced by the euclidean distance: d ϕ, (T, T ) (ϕ T (i, j) ϕ T (i, j)). i j n To distinguish it from other cophenetic metrics obtained through other L p normes, we shall call it the euclidean cophenetic metric. Example. Consider the phylogenetic trees T, T T 4 depicted in Fig.. Their total cophenetic vectors are ϕ(t ) (,, 0, 0,, 0, 0,,, ) ϕ(t ) (, 0, 0, 0,,,, 3,, 3) and therefore d ϕ, (T, T ) 7. As we shall see below, the expected values of the square of d ϕ, on T 4 under the uniform and the Yule models are, respectively, 0.56 and 9.4, and hence these two trees are quite more similar than average with respect to the euclidean cophenetic metric under both models. 3 4 T 3 4 T Figure : Two phylogenetic trees with 4 leaves. Let Dn the random variable that chooses a pair of trees T, T T n and computes d ϕ, (T, T ). Its expected values under the Yule and the uniform models are given by the following two theorems. Recall that the n-th harmonic number H n is defined as H n n i /i. Theorem. For every n, the expected value of Dn under the Yule model is E Y (Dn) n ( 3n 0n + 8(n + )H n 4(n + )H ) n. n Theorem 3. For every n, the expected value of Dn under the uniform model is E U (Dn) Å ã 3 (4n3 +8n n(n + 3) (n )!! n(n + 7) (n )!! 0n) (n 3)!! 4 (n 3)!! Since H n ln(n) and (n )!!/(n 3)!! πn, these formulas imply that ( 4 E Y (Dn) 6n, E U (Dn) 3 π ) n 3. 4 We shall prove the formulas in Theorems and 3 by reducing the computation of the expected value of D n to that of the following random variables: 4

5 S n, the random variable that chooses a tree T T n and computes its Sackin index S [3], defined by S(T ) n δ T (i) Φ n, the random variable that chooses a tree T T n and computes its total cophenetic index Φ [8], defined by Φ(T ) ϕ T (i, j) Φ () n i i<j n, the random variable that chooses a tree T T n and computes Φ () (T ) ϕ T (i, j) i j n For the models under consideration, the expected values of these variables are related to that of Dn by the next proposition. In it and henceforth, we shall denote by E(X) the expected value of a random variable X on T n under a generic probability distribution p : T n [0, ] on T n invariant under relabelings. The probability distributions p Y and p U defined by the Yule and the uniform models, respectively, are invariant under relabelings, and therefore the expected values under these specific models, which will be denoted by E Y and E U, respectively, are special cases of E. Proposition 4. E(Dn) E(Φ () n ) E(S n) 4 E(Φ n) n n(n ). Proof. To simplify the notations, let ϕ n be the random variable that chooses a tree T T n and computes ϕ T (, ). δ n be the random variable that chooses a tree T T n and computes δ T (). Let us compute now E(Dn) from its very definition: E(Dn) d ϕ, (T, T ) p(t )p(t ) (T,T ) T n ( (T,T ) T n i j n i j n (T,T ) Tn i j n ( (T,T ) T n (T,T ) T n (ϕ T (i, j) ϕ T (i, j)) )p(t )p(t ) (ϕ T (i, j) + ϕ T (i, j) ϕ T (i, j)ϕ T (i, j))p(t )p(t ) ϕ T (i, j) p(t )p(t ) + ) ϕ T (i, j)ϕ T (i, j)p(t )p(t ) (T,T ) T n ϕ T (i, j) p(t )p(t ) 5

6 ( ϕ T (i, j) p(t ) + ϕ T (i, j) p(t ) i j n( T T n T T )( n )) ϕ T (i, j)p(t ) ϕ T (i, j)p(t ) T T n T T n ( ( ) ) ϕ T (i, j) p(t ) ϕ T (i, j)p(t ) i j n T T n T T n ( ( ϕ T (i, j) )p(t ) ϕ T (i, j)p(t ) T T n i j n i<j n T T n ( ) ϕ T (i, i)p(t ) i n T T n Ç å Φ () n ( ) (T )p(t ) ϕ T (, )p(t ) T T n T T ( n ) n δ T ()p(t ) T T n E(Φ () n ) n(n )E(ϕ n ) ne(δ n ) Now, the values of E(δ n ) and E(ϕ n ) can be easily obtained from E(S n ) and E(Φ n ), respectively, using the invariance under relabelings of the probability distribution under which we compute the expected values E: E(δ n ) E(S n )/n, E(ϕ n ) E(Φ n )/ ( ) n The formula in the statement is then obtained by replacing E(δ n ) and E(ϕ n ) by these values. The expected values of S n and Φ n under the Yule and the uniform models are known: ( (n )!! ) E Y (S n ) n(h n ) E U (S n ) n (n 3)!! E Y (Φ n ) n(n ) n(h n ) E U (Φ n ) Ç å n ((n )!! ) (n 3)!! The formula for E Y (S n ) was proved in [5] and the other three, in [8]. To obtain the expected values of Dn, it remains to compute the expected values of Φ () n. They are given by the following result. Proposition 5. For every n, (a) E Y (Φ () n ) 5n(n ) 8n(H n ) (b) E U (Φ () n ) 6 n(4n + n 7) 3 )!! n(n + 3)(n 4 (n 3)!! This proposition is proved in the Appendix at the end of this paper. Finally, the identities given in Theorems and 3 are obtained by replacing, in the identity ) 6

7 given in Proposition 4, E(S n ), E(Φ n ), and E(Φ () n ) by their values under the Yule and the uniform models, respectively. We leave the last details to the reader. 4. An experiment on TreeBASE In this section we report on a very simple experiment to show how d ϕ, can be used to test evolutionary hypotheses. In this experiment, we have compared the expected value of d ϕ, on T n under the uniform and the Yule models with its average value on the set TreeBASE bin,n of binary phylogenetic trees with n leaves contained in TreeBASE [0]. To perform this experiment, we have taken some decisions. First, since there are only very few values n > 50 such that TreeBASE bin,n > 0, we have decided to consider only those binary trees contained in TreeBASE with n 50 leaves. On the other hand, even for those n such that TreeBASE bin,n is relatively large, in most cases it does not contain many pairs of trees with the same taxa. So, instead of computing the average value of d ϕ, on TreeBASE bin,n by averaging the values d ϕ,(t, T ) for pairs of trees T, T with exactly the same n taxa, we have made use of the formula given in Proposition 4, as if TreeBASE bin,n was closed under relabelings: that is, we have taken only into account the shapes of the trees contained in it. This is consistent with the fact that our final goal is to test models of evolution that produce tree shapes. So, we have computed the average values of Φ (), of the Sackin index S, and of the total cophenetic index Φ on TreeBASE bin,n, and we have taken as average value of d ϕ, on this set the result of appying the formula in Proposition 4. The detailed results of these computations, as well as the Python and R scripts used to compute and analyze them, are available in the Supplementary Material web page Fig. plots the log of these average values as a function of log(n). We have added the curves of the log of the expected values of Dn under the Yule distribution (lower, dotted curve) and under the uniform distribution (upper, dashed curve), again as a function of log(n). The graphic shows that the expected value of d ϕ, on (the shapes of) the phylogenetic trees contained in TreeBASE is better explained by the uniform model than by the Yule model. This agrees with the results of similar experiments using other measures (see, for instance, [3, 8]). 5. Conclusions and discussion In this paper we have obtained formulas for the expected values under the Yule and the uniform models of the square of the euclidean cophenetic metric d ϕ,, defined by the euclidean distance between cophenetic vectors. These formulas are explicit and hold on spaces T n of fully resolved phylogenetic trees with any number n of leaves. 7

8 log of the expected value of the square of the distance number of leaves Figure : Log-log plots of the mean of Dn for the binary trees in TreeBASE with a fixed number n of leaves, of E Y (Dn ) (dotted curve) and E U (Dn ) (dashed curve). These formulas have been obtained through long algebraic manipulations of sums of sequences. To double-check our results, we have computed the exact value of E Y (D n) and E U (D n) for n 3,..., 7, by generating all trees with up to 7 leaves. Moreover, we have computed numerical approximations to these values for n 0, 0,..., 00, by generating pairs of random trees until the numerical method stabilizes. These numerical experiments confirm that our formulas give the right figures. Table gives the exact values for n 3,..., 7. The results of the simulations for n 0, 0,..., 00, as well as the Python scripts used in these computations, are also available in the aforementioned Supplementary Material web page expectedcophdist/ E Y (Dn) E U (Dn) Table : Values of E Y (Dn ) and E U (Dn ) for n 3,..., 7. They agree with those given by our formulas. The formulas for E Y (D n) and E U (D n) grow in different orders: E Y (D n) is in Θ(n ), while E U (D n) is in Θ(n 3 ). Therefore, they can be used to test the Yule and the uniform models as null stochastic models of evolution for collections of phylogenetic trees reconstructed by different methods. We have reported on a first experiment of this type, which reinforces the conclusion that real world phylogenetic trees (that is, those contained in TreeBASE) are not consistent with the Yule model of evolution. We plan to report in a future paper on more 8

9 extensive tests on stochastic models of evolutionary processes, including Ford s α-model [] and Aldous β-model []. Acknowledgements The research reported in this paper has been partially supported by the Spanish government and the UE FEDER program, through projects MTM and TIN E. References [] M. Abramowitz, I. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover (964). [] D. Aldous. Probability distributions on cladograms. Random discrete structures, IMA Vol. Math. Appl. 76 (Springer,996), 8. [3] M. G. B. Blum, O. François, Which random processes describe the Tree of Life? A large-scale study of phylogenetic tree imbalance. Sys. Biol. 55 (006), [4] J. Brown, Probabilities of evolutionary trees. Syst. Biol. 43 (994), [5] D. Bryant, M. Steel, Computing the distribution of a tree metric. IEEE/ACM Trans. Comp. Biol. Bioinf. 6 (009), [6] G. Cardona, A. Mir, F. Rosselló, The expected value under the Yule model of the squared path-difference distance. Appl. Math. Let. 5 (0), [7] G. Cardona, A. Mir, F. Rosselló, Exact formulas for the variance of several balance indices under the Yule model. To appear in J. Math. Bio. http: //dx.doi.org/0.007/s [8] G. Cardona, A. Mir, L. Rotger, F. Rosselló, D. Sánchez, Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf. BMC Bioinformatics (03) 4:3 [9] L. L. Cavalli-Sforza, A. Edwards, Phylogenetic analysis. Models and estimation procedures. Am. J. Hum. Genet., 9 (967), [0] J. Felsenstein, Inferring Phylogenies. Sinauer Associates Inc., 004. [] D. Ford. Probabilities on cladograms: Introduction to the alpha model. arxiv:math/0546 [math.pr] (005). [] Hypergeometric3F/03/07/0. 9

10 [3] HypergeometricF/03/03/0. [4] E. Harding, The probabilities of rooted tree-shapes generated by random bifurcation. Adv. Appl. Prob. 3 (97), [5] S. B. Heard, Patterns in Tree Balance among Cladistic, Phenetic, and Randomly Generated Phylogenetic Trees. Evolution 46 (99), [6] M. Hendy, C. Little, D. Penny, Comparing Trees with Pendant Vertices Labelled. SIAM J. Applied Math. 44 (984), [7] A. Mir, F. Rosselló, The mean value of the squared path-difference distance for rooted phylogenetic trees. J. Math. Anal. Appl. 37 (00), [8] A. Mir, F. Rosselló, L. Rotger, A new balance index for phylogenetic trees. Math. Biosc. 4 (03), [9] A. Mooers, S. B. Heard, Inferring evolutionary process from phylogenetic tree shape. Quart. Rev. Biol. 7 (997), [0] V. Morell, TreeBASE: the roots of phylogeny. Science 73 (996), [] M. Petkovsek, H. Wilf, D. Zeilberger, A B. AK Peters Ltd. (996). Available online at [] D. E. Rosen, Vicariant Patterns and Historical Explanation in Biogeography. Syst. Biol. 7 (978), [3] M. J. Sackin, Good and bad phenograms. Sys. Zool, (97), 5 6. [4] R. Sokal, F. Rohlf, The Comparison of Dendrograms by Objective Methods. Taxon (96), [5] M. Steel, Distribution of the symmetric difference metric on phylogenetic trees. SIAM J. Discr. Math. (988), [6] M. Steel, A. McKenzie, Distributions of cherries for two models of trees. Math. Biosc. 64 (000), 8 9. [7] M. Steel, A. McKenzie, Properties of phylogenetic trees generated by Yuletype speciation models. Math. Biosc. 70 (00), 9. [8] M. A. Steel, D. Penny, Distributions of tree comparison metrics some new results, Syst. Biol. 4 () (993) 6 4. [9] G. U. Yule, A mathematical theory of evolution based on the conclusions of Dr J. C. Willis. Phil. Trans. Royal Soc. (London) Series B 3 (94), 87. 0

11 Appendix: Proof of Proposition 5 Proof of Proposition 5.(a) For every T T n, let Φ(T ) S(T ) + Φ(T ) i j n ϕ T (i, j), and let Φ n be the random variable that chooses a tree T T n and computes Φ(T ). We have that E Y (Φ n ) E Y (S n ) + E Y (Φ n ) n(n ). To compute E Y (Φ () n ), we shall use an argument similar to the one used in the proof of [6, Prop. 3]. Notice that E Y (Φ () n ) Φ () (T ) py (T ) T T n S k {,...,n} S k k T k T (S k ) T n k T (Sc k ) Φ () (Tk T n k) p Y (T k T n k) Now, on the one hand, we have the following easy lemma on P Y (T T ): see [7, Lem. ]. Lemma 6. Let S k {,..., n} with S k k, let T k T (S k ) and T n k T (Sk c ). Then, P Y (T k T n k) (n ) ( n)p (T k )P (T n k). k On the other hand, we have the following recursive expression for Φ () (T T ). Lemma 7. Let S k {,..., n} with S k k, let T k T (S k ) and T n k T (Sk c ). Then Ç å Ç å Φ () (T k T n k) Φ () (T k )+Φ () k + n k + (T n k)+φ(t k )+Φ(T n k)+ +. Proof. Let us assume, without any loss of generality, that S {,..., m} and S {m +,..., n}. Then ϕ Tk (i, j) + if i, j k ϕ Tk T j) ϕ n k(i, T (i, j) + if k + i, j n n k 0 otherwise

12 and therefore Φ () (T k T n k) ϕ Tk T (i, j) n k i j n (ϕ Tk (i, j) + ) + (ϕ T (i, j) + ) n k i j k k+ i j n (ϕ Tk (i, j) + ϕ Tk (i, j) + ) + i j k Ç å k+ i j n Φ () k + (T k ) + Φ(T k ) + + Φ () (T n k) + Φ(T n k) + (ϕ T (i, j) + ϕ n k T (i, j) + ) n k Ç å n k +. So, if we set f(a, b) ( ) ( a+ + b+ ), we have that E Y (Φ () n ) Ç å n k T k T k ] T n k T n k +f(k, n k) (n ) ( n)p Y (T k )P Y (T n k) k [ Φ () (T k )P Y (T k )P Y (T n n k) T k T n k + Φ () (T n k)p Y (T k )P Y (T n k) T k T n k + Φ(T k )P Y (T k )P Y (T n k) T k T n k + Φ(T n k)p Y (T k )P Y (T n k) T k T n k + ] f(k, n k)p Y (T k )P Y (T n k) T k n T n k [ Φ () (T k ) + Φ () (T n k) + (Φ(T k ) + Φ(T n k)) [ Φ () (Tk )P Y (T k ) + Φ () (T n k)p Y (T n k) T k T n k + Φ(T k )P Y (T k ) + Φ(T n k)p Y (T n k) + f(k, n k) T k T n k [ E Y (Φ () k ) + E Y (Φ () n n k) + E Y (Φ k ) + E Y (Φ n k ) Ç å Ç å k + n k + ] + + n E Y (Φ () k ) + 4 n E Y (Φ k ) + n(n + ). 3 ]

13 In particular E Y (Φ ) n n and therefore E Y (Φ () n ) n n + n n E Y (Φ () k ) + 4 n n n n 4 n n E Y (Φ k ) + n(n ). 3 E Y (Φ () k ) + n E Y (Φ () ) E Y (Φ k ) + 4 n E Y (Φ ) + n n n(n ) + n 3 n n E Y (Φ () ) + n E Y (Φ () ) + 4 n E Y (Φ ) + n n n E Y (Φ () ) + 5n 8. Setting x n E Y (Φ () n )/n, this recurrence becomes x n x n and the solution of this recursive equation with x E Y (Φ () ) 0 is x n n k ( 5 8 ) 5(n ) 8(H n ) 5n + 3 8H n k from where we deduce that E Y (Φ () n ) 5n + 3n 8nH n, as we claimed. Proof of Proposition 5.(b) To compute E U (Φ () n ), we shall use an argument similar to the one used in [7]. For every k,..., n, let f k,n d k,n {T T n ϕ T (, ) k} {T T n ϕ T (i, j) k} for every i < j n {T T n δ T () k} {T T n δ T (i) k} for every i n (where X denotes the cardinal of the set X). Lemma 8. For every n, E U (Φ () ( n ) n k d k,n + (n 3)!! Ç å n n k f k,n ) 3

14 Proof. Under the uniform model, E U (Φ () T T n ) n Φ () (T ), (n 3)!! where Φ () (T ) ϕ T (i, j) T T n T T n i j n δ T (i) + ϕ T (i, j) i n T T n i<j n T T n k {T T n δ T (i) k} i n + i n i<j n k d k,n + n k d k,n + i j n n k {T T n ϕ T (i, j) k} Ç n i<j n å n n k f k,n k f k,n. T T n ϕ T (i, j) A formula for d k,n was obtained in the proof of [8, Lem. ]: (n k 3)! k d k,n. () (n k )!n k As far as f k,n goes, we have the following result. In it, and henceforth, p F q denotes the (generalized) hypergeometric function defined by Å ã a,..., a p pf q b,..., b ; z (a ) k (a p ) k zk q (b ) k (b q ) k k!, k 0 where (a) 0 and (a) k : a (a + ) (a + k ) for k. Lemma 9. For every n, f 0,n (n 4)!! and Å ã (n k 5)!k, n, k + n f k,n (n k 4)!! 3F k+5 k n, n + 3 ; for every k,..., n. Proof. Let us start by proving f 0,n (n 4)!! by induction on n. It is clear that f 0, ( 4)!!. Assume now that f 0, ((n ) 4)!!. Every phylogenetic tree T with n leaves such that ϕ T (, ) 0, that is, where LCA T (, ) is the root, is obtained by taking a phylogenetic tree T with n 4

15 leaves such that ϕ T (, ) 0 and adding a new pendant edge, ending in the leaf n, to any edge in T. Then, since there are f 0, (n 6)!! trees T T such that ϕ T (, ) 0, and each one of them has (n ) edges where we can add the new edge, we obtain f 0,n (n 4)(n 6)!! (n 4)!!. Now, to compute f k,n for k, we shall study the structure of a tree T T n such that ϕ T (, ) k; to simplify the notations, let us denote by x the node LCA T (, ), which has depth k, and by T 0 the subtree of T rooted at x. Then, on the one hand, T 0 is a phylogenetic tree on a subset S 0 {,..., n} containing,, and since its root x is the LCA of and in T, we have that ϕ T0 (, ) 0. On the other hand, there is a path (r v, v, v 3,..., v k+ x) in T from r to x. For every j,..., k, let T j be the subtree rooted at the child of v j other than v j+ ; see Fig. 3. So, the tree T is determined by: A number 0 m n k, so that m + will be the number of leaves of the phylogenetic tree T 0 rooted at LCA T (, ) A subset {i,..., i m } of {3,..., n}. There are ( ) n m such subsets. A phylogenetic tree T 0 on {,, i,..., i m } such that ϕ T0 (, ) 0. There are f 0,m+ (m)!! such trees. An ordered k-forest, that is, an ordered sequence of phylogenetic trees (T, T,..., T k ) such that k i L(T i) {,..., n} {,, i,..., i m }. The number of such ordered k-forests is (see, for instance, [7, Lem. ]) (n m k 5)!k (n m k )! n m k.... T T x T 0 T k Figure 3: The structure of a tree T with ϕ T (, ) k. 5

16 This shows that f k,n can be computed as f k,n k n k m0 n k m0 n k m0 (n )!k n k (number of ways of choosing {i,..., i m }) (number of trees in T m+ with ϕ T (, ) 0) (number of ordered k-forests on n m leaves) Ç å n (m)!! m (n m k 5)!k (n m k )! n m k (n )!m! m (n m k 5)! m!(n m )!(n m k )! n m k n k m0 Now, taking into account that () m m! 4 m (n m k 5)! (n m )!(n m k )! m (n )! ( n) m ( ) (n m )! m (n k )! (k + n) m ( ) Å ã k + 5 n Å ã k n + 3 we have that Å ã, n, k + n 3F k+5 k n, n + 3 ; m 0 m 0 n k m0 m m (n k m )! ( )m (n k 5)!! m (n k m 5)!!, ( )m (n k 6)!! m (n k m 6)!! () m ( n) m (k + n) m ( k+5 n) m ( k n + 3) m m! m!(n )!(n k )! m (n k m 5)!! m (n k m 6)!! (n m )!(n k m )!(n k 5)!!(n k 6)!!m! (n )!(n k )!(n k m 5)! m (n m )!(n k m )!(n k 5)! (n )!(n k )! (n k 5)! from where we deduce that n k m0 n k m0 (n k m 5)!4 m (n m )!(n k m )! (n k m 5)!4 m (n m )!(n k m )! Å ã (n k 5)!, n, k + n (n )!(n k )! 3 F k+5 k n, n + 3 ; 6

17 and hence f k,n as we claimed. n k (n )!k 4 m (n m k 5)! n k (n m )!(n m k )! m0 Å ã (n )!k n k (n k 5)!, n, k + n (n )!(n k )! 3 F k+5 k n, n + 3 ; Å ã (n k 5)!k, n, k + n (n k 4)!! 3F k+5 k n, n + 3 ; We must compute now the sums k d k,n, n k f k,n. To do that, we shall use the following auxiliary lemma. Lemma 0. For every n and m, let Then, U n,m n k0 k m (n + k )! k! k. U n,0 (n 4)!! U n, (n )(n 4)!! (n 3)!! U n, (n )(n 4)!! (n )(n 3)!! U n,3 (n 3 + 3n 3n )(n 4)!! (3n + n )(n 3)!! Proof. The proof of these identities is standard, using well known equalities for hypergeometric functions and the lookup algorithm given in [, p. 36]. We shall prove in detail the identity for m, and we leave the details of the rest to the reader. Notice that U n, n k0 k0 k n (n + k )! k (n + k )! k! k k! k (k + ) (n + k )! (k + )! k+ kn n 3 k0 (k + ) (n + k )! (k + )! k+ (k + ) (n + k )! (k + )! k+ Set X n k0 (k + ) (n + k )! (k + )! k+, Y n kn (k + ) (n + k )! (k + )! k+ We compute now these two summands. 7

18 As to X n, If we set we have that X n (n )! k0 (k + ) (n + k )! (n )!(k + )! k t k (k + ) (n + k )! (n )!(k + )! k, t k+ (k + )(k + n) t k (k + ) and therefore, by the lookup algorithm [, p. 36], we have that Å (n )!, n X n F ; ã Å ã (n )! n n, F ; (using (5.3.4) in [, p. 559]) (n )! (n) k ( ) k ( )k () k k! k 0 ( (n )! (n)0 ( ) 0 ( )0 + (n) ) ( ) ( ) () 0 0! ()! (n )! (n + ) As to Y n, If we take now we have that (k + n ) (n + k 3)! Y n (k + n )! k+ k0 (n ) (n 3)! (k + n ) (n + k 3)! (n )! k0 (k + n )! k () (n 3)! ()! t k (k + n ) (n + k 3)! (k + n )! k () (n 3)! ()! t k+ (n + k)(n + k ) t k (k + n ) and therefore, again by the lookup algorithm [, p. 36], we have that Y n (n Å ) (n 3)!, n, n (n )! 3F n, n ; (n ) (n 3)! [ Å n, (n )! F n ; ã ã + n F Å n, n ; ã] (using []). 8

19 Now Å n, F n ; ã Å n, F n (n ) Γ(n ) [ Γ(n ) Γ(n ) Γ(0) (n ) (n )! [ (n )! + (n 3)! (n )! ã ; + Γ(n) Γ() + Γ(n ) (n 3)!! ] (using (5.3.4) in [, p. 559]) ] Γ( ) (using [3]) + (n 3)!! Å n, F n ; ã Å ã, n F n ; (using (5.3.4) in [, p. 559]) Γ(n) ( Γ(n 4 ) ( n) Γ(n ) Γ( ) + Γ(n + ) ) Γ( 3 ) + Γ(n) (using [3]) n (n )! ( (n 3)!! (n )!! ) (n )! + + (n )! Therefore, (n )! ((n 3)!! + (n )!! + n (n )!) (n )! Y n (n ) (n 3)! [ (n )! (n )! + (n 3)!! + n (n )! ] ((n 3)!! + (n )!! + n (n )!) (n )! n (n + )(n )! + (n )!! and finally as we claimed. U n, X n Y n n (n + )(n )! (n )!! (n )(n 4)!! (n )(n 3)!! Lemma. For every n, k d k,n (4n )(n 3)!! 3(n )!!. Proof. By equation (), k d k,n k 3 n (n k 3)! (n k )! n k k0 (n k ) 3 (n + k )! k! k (n ) 3 U n,0 3(n ) U n, + 3(n )U n, U n,3 (n ) 3 (n 4)!! 3(n ) ( (n )(n 4)!! (n 3)!! ) +3(n ) ( (n )(n 4)!! (n )(n 3)!! ) ( (n 3 + 3n 3n )(n 4)!! (3n + n )(n 3)!! ) (4n )(n 3)!! 3(n )(n 4)!!. 9

20 Lemma. For every n, n k f k,n 3 (4n + )(n 3)!! 3 (n )!!. Proof. To simplify the notations, set S n n proof of Lemma 9, and therefore S n Set now S n f k,n (n )! n (n )!k n k n n n k m0 n k k k 3 m0 k f k,n. As we have seen in the 4 m (n m k 5)! (n m )!(n m k )! 4 m (n k m 5)! (n k )!(n k m )! (n )! n k n k k 3 4 n k m (k + m )! (k + m)!m! m0 ( n (n )! n k 3 n k Ç å) k k + k + m 4 m m k + m ( m (n )! n 6 n n + n + k 3 n k Ç å) k + m k 4 m m k + m m n k 3 n k k m 4 m m Since S 3 0, we have that Ç å k + m k + m n 3 S n (S p+ S p) p3 k 3 n k k m 4 m m Ç å k + m k + m and S p+ S p p 3 Ç (p )3 k 3 p k 3 p + k (p k )4 p k p (p )3 p + p (p )3 p + (p )3 p + p 3 p (p )! p (p )! å k 3 (p k 3)! k (p k )(p )!(p k )! p 3 p 3 k 3 (p k 3)! k (p k )! (p k ) 3 (p + k )! k p+ (k + )! 0

21 (p )3 p + (p )3 p + p (p )! p k [ p (p k ) 3 (p + k )! k k! (p k ) 3 (p + k )! p (p )! k k! k0 (p ) 3 (p )! ] (p )3 (p )! p (p ) (p k ) 3 (p + k )! p + p (p )! k k! k0 (p ) ( ) p + (4p )(p 3)!! 3(p )!! (p )!! (p ) (p 3)!! p + (4p ) (p )!! 3 Therefore ( S n (p 3)!! (4p ) (p )!! p3 (p ) ) p 3 (by Lemma ) Now, applying Gosper s algorithm [, p. 77] we have that (p 3)!! (4p ) (p )!! ( Ç å n 3 ) 3 n+ 3(4n 3n ) 39 n n p3 and then S n ( Ç å n 3 ) 3 n+ 3(4n 3n ) 39 n n n 8(n + ) n+ 3(n 3) n + (4n + )(n 3)!! 3(n + ) +. n 3(n 4)!! Finally, S n Å ã (n )! n 6 n + n + S n 3(n )! n (4n + )(n 3)!! (4n + )(n 3)!! 3 (n )!!.

22 Summarizing, by Lemmas 8,, and, we have that as we claimed. ( Ç å n n ) n k d k,n + k f k,n (n 3)!! [ n((4n )(n 3)!! 3(n )!!) (n 3)!! Ç å n ( + 3 (4n + )(n 3)!! 3 )] (n )!! E U (Φ () n ) 6 n(4n + n 7) 3n(n + 3) 4 (n )!! (n 3)!!

arxiv: v1 [q-bio.pe] 13 Aug 2015

arxiv: v1 [q-bio.pe] 13 Aug 2015 Noname manuscript No. (will be inserted by the editor) On joint subtree distributions under two evolutionary models Taoyang Wu Kwok Pui Choi arxiv:1508.03139v1 [q-bio.pe] 13 Aug 2015 August 14, 2015 Abstract

More information

Probabilities of Evolutionary Trees under a Rate-Varying Model of Speciation

Probabilities of Evolutionary Trees under a Rate-Varying Model of Speciation Probabilities of Evolutionary Trees under a Rate-Varying Model of Speciation Mike Steel Biomathematics Research Centre University of Canterbury, Private Bag 4800 Christchurch, New Zealand No. 67 December,

More information

On joint subtree distributions under two evolutionary models

On joint subtree distributions under two evolutionary models Noname manuscript No. (will be inserted by the editor) On joint subtree distributions under two evolutionary models Taoyang Wu Kwok Pui Choi November 5, 2015 Abstract In population and evolutionary biology,

More information

DISTRIBUTIONS OF CHERRIES FOR TWO MODELS OF TREES

DISTRIBUTIONS OF CHERRIES FOR TWO MODELS OF TREES DISTRIBUTIONS OF CHERRIES FOR TWO MODELS OF TREES ANDY McKENZIE and MIKE STEEL Biomathematics Research Centre University of Canterbury Private Bag 4800 Christchurch, New Zealand No. 177 May, 1999 Distributions

More information

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute

More information

Probabilities on cladograms: introduction to the alpha model

Probabilities on cladograms: introduction to the alpha model arxiv:math/0511246v1 [math.pr] 9 Nov 2005 Probabilities on cladograms: introduction to the alpha model Daniel J. Ford Abstract This report introduces the alpha model. The alpha model is a one parameter

More information

arxiv: v1 [q-bio.pe] 1 Jun 2014

arxiv: v1 [q-bio.pe] 1 Jun 2014 THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from

More information

Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee)

Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee) Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee) Michael Fuchs Department of Applied Mathematics National Chiao Tung University

More information

Maximum Agreement Subtrees

Maximum Agreement Subtrees Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 1 / 23 Phylogenetics Problem Given a collection

More information

On the Subnet Prune and Regraft distance

On the Subnet Prune and Regraft distance On the Subnet Prune and Regraft distance Jonathan Klawitter and Simone Linz Department of Computer Science, University of Auckland, New Zealand jo. klawitter@ gmail. com, s. linz@ auckland. ac. nz arxiv:805.07839v

More information

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES SIMONE LINZ AND CHARLES SEMPLE Abstract. Calculating the rooted subtree prune and regraft (rspr) distance between two rooted binary

More information

arxiv: v1 [cs.cc] 9 Oct 2014

arxiv: v1 [cs.cc] 9 Oct 2014 Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz May 7, 08 arxiv:40.7v [cs.cc] 9 Oct 04 Abstract A ternary

More information

Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies

Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies Bulletin of Mathematical Biology manuscript No. (will be inserted by the editor) Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies Peter J. Humphries Simone

More information

A concise proof of Kruskal s theorem on tensor decomposition

A concise proof of Kruskal s theorem on tensor decomposition A concise proof of Kruskal s theorem on tensor decomposition John A. Rhodes 1 Department of Mathematics and Statistics University of Alaska Fairbanks PO Box 756660 Fairbanks, AK 99775 Abstract A theorem

More information

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS PETER J. HUMPHRIES AND CHARLES SEMPLE Abstract. For two rooted phylogenetic trees T and T, the rooted subtree prune and regraft distance

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

arxiv: v1 [math.ra] 13 Jan 2009

arxiv: v1 [math.ra] 13 Jan 2009 A CONCISE PROOF OF KRUSKAL S THEOREM ON TENSOR DECOMPOSITION arxiv:0901.1796v1 [math.ra] 13 Jan 2009 JOHN A. RHODES Abstract. A theorem of J. Kruskal from 1977, motivated by a latent-class statistical

More information

Tree Models for Macro-Evolution and Phylogenetic Analysis

Tree Models for Macro-Evolution and Phylogenetic Analysis Tree Models for Macro-Evolution and Phylogenetic Analysis Graham Jones 5th February 2009 email: art@gjones.name web site: www.indriid.com Abstract It has long been recognized that phylogenetic trees are

More information

arxiv: v1 [math.pr] 12 Jan 2017

arxiv: v1 [math.pr] 12 Jan 2017 Central limit theorem for the Horton-Strahler bifurcation ratio of general branch order Ken Yamamoto Department of Physics and arth Sciences, Faculty of Science, University of the Ryukyus, 1 Sembaru, Nishihara,

More information

THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES

THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES Syst. Biol. 45(3):33-334, 1996 THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES DOUGLAS E. CRITCHLOW, DENNIS K. PEARL, AND CHUNLIN QIAN Department of Statistics, Ohio State University, Columbus,

More information

The Moran Process as a Markov Chain on Leaf-labeled Trees

The Moran Process as a Markov Chain on Leaf-labeled Trees The Moran Process as a Markov Chain on Leaf-labeled Trees David J. Aldous University of California Department of Statistics 367 Evans Hall # 3860 Berkeley CA 94720-3860 aldous@stat.berkeley.edu http://www.stat.berkeley.edu/users/aldous

More information

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas. Solving the Maximum Agreement Subtree and Maximum Compatible Tree problems on bounded degree trees LIRMM, Montpellier France 4th July 2006 Introduction The Mast and Mct problems: given a set of evolutionary

More information

PLEASE SCROLL DOWN FOR ARTICLE

PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by:[ujf - INP Grenoble SICD 1] [UJF - INP Grenoble SICD 1] On: 6 June 2007 Access Details: [subscription number 772812057] Publisher: Taylor & Francis Informa Ltd Registered

More information

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

More information

Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

More information

Greedy Trees, Caterpillars, and Wiener-Type Graph Invariants

Greedy Trees, Caterpillars, and Wiener-Type Graph Invariants Georgia Southern University Digital Commons@Georgia Southern Mathematical Sciences Faculty Publications Mathematical Sciences, Department of 2012 Greedy Trees, Caterpillars, and Wiener-Type Graph Invariants

More information

On the mean connected induced subgraph order of cographs

On the mean connected induced subgraph order of cographs AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 71(1) (018), Pages 161 183 On the mean connected induced subgraph order of cographs Matthew E Kroeker Lucas Mol Ortrud R Oellermann University of Winnipeg Winnipeg,

More information

Gap Embedding for Well-Quasi-Orderings 1

Gap Embedding for Well-Quasi-Orderings 1 WoLLIC 2003 Preliminary Version Gap Embedding for Well-Quasi-Orderings 1 Nachum Dershowitz 2 and Iddo Tzameret 3 School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel Abstract Given a

More information

A Geometric Approach to Tree Shape Statistics

A Geometric Approach to Tree Shape Statistics Syst. Biol. 55(4):652 661, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150600889617 A Geometric Approach to Tree Shape Statistics FREDERICK

More information

Historical Biogeography. Historical Biogeography. Systematics

Historical Biogeography. Historical Biogeography. Systematics Historical Biogeography I. Definitions II. Fossils: problems with fossil record why fossils are important III. Phylogeny IV. Phenetics VI. Phylogenetic Classification Disjunctions debunked: Examples VII.

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

On Symmetries of Non-Plane Trees in a Non-Uniform Model

On Symmetries of Non-Plane Trees in a Non-Uniform Model On Symmetries of Non-Plane Trees in a Non-Uniform Model Jacek Cichoń Abram Magner Wojciech Spankowski Krystof Turowski October 3, 26 Abstract Binary trees come in two varieties: plane trees, often simply

More information

MAXIMAL CLADES IN RANDOM BINARY SEARCH TREES

MAXIMAL CLADES IN RANDOM BINARY SEARCH TREES MAXIMAL CLADES IN RANDOM BINARY SEARCH TREES SVANTE JANSON Abstract. We study maximal clades in random phylogenetic trees with the Yule Harding model or, equivalently, in binary search trees. We use probabilistic

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

On improving matchings in trees, via bounded-length augmentations 1

On improving matchings in trees, via bounded-length augmentations 1 On improving matchings in trees, via bounded-length augmentations 1 Julien Bensmail a, Valentin Garnero a, Nicolas Nisse a a Université Côte d Azur, CNRS, Inria, I3S, France Abstract Due to a classical

More information

arxiv: v1 [q-bio.pe] 3 May 2016

arxiv: v1 [q-bio.pe] 3 May 2016 PHYLOGENETIC TREES AND EUCLIDEAN EMBEDDINGS MARK LAYER AND JOHN A. RHODES arxiv:1605.01039v1 [q-bio.pe] 3 May 2016 Abstract. It was recently observed by de Vienne et al. that a simple square root transformation

More information

arxiv:math/ v2 [math.qa] 30 Mar 2005

arxiv:math/ v2 [math.qa] 30 Mar 2005 THE OPERAD QUAD IS KOSZUL JON EIVIND VATNE arxiv:math/04580v2 [math.qa] 30 Mar 2005 Abstract. The purpose of this paper is to prove the koszulity of the operad Quad, governing quadri-algebras. That Quad

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

arxiv: v1 [math.co] 25 Jun 2014

arxiv: v1 [math.co] 25 Jun 2014 THE NON-PURE VERSION OF THE SIMPLEX AND THE BOUNDARY OF THE SIMPLEX NICOLÁS A. CAPITELLI arxiv:1406.6434v1 [math.co] 25 Jun 2014 Abstract. We introduce the non-pure versions of simplicial balls and spheres

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Coding and Counting Arrangements of Pseudolines

Coding and Counting Arrangements of Pseudolines Coding and Counting Arrangements of Pseudolines Stefan Felsner Institut für Mathematik Technische Universität Berlin Strasse des 17. Juni 136 D-1063 Berlin, Germany Pavel Valtr Department of Applied Mathematics

More information

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1.

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1. Chapter 11 Min Cut By Sariel Har-Peled, December 10, 013 1 Version: 1.0 I built on the sand And it tumbled down, I built on a rock And it tumbled down. Now when I build, I shall begin With the smoke from

More information

The maximum agreement subtree problem

The maximum agreement subtree problem The maximum agreement subtree problem arxiv:1201.5168v5 [math.co] 20 Feb 2013 Daniel M. Martin Centro de Matemática, Computação e Cognição Universidade Federal do ABC, Santo André, SP 09210-170 Brasil.

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E. Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 14 March 11 Lecturer: Lior Pachter Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar 14.1 Introduction The goal of today s lecture

More information

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :

More information

Bounded Treewidth Graphs A Survey German Russian Winter School St. Petersburg, Russia

Bounded Treewidth Graphs A Survey German Russian Winter School St. Petersburg, Russia Bounded Treewidth Graphs A Survey German Russian Winter School St. Petersburg, Russia Andreas Krause krausea@cs.tum.edu Technical University of Munich February 12, 2003 This survey gives an introduction

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms CSE 101, Winter 2018 Design and Analysis of Algorithms Lecture 5: Divide and Conquer (Part 2) Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ A Lower Bound on Convex Hull Lecture 4 Task: sort the

More information

Binary Decision Diagrams. Graphs. Boolean Functions

Binary Decision Diagrams. Graphs. Boolean Functions Binary Decision Diagrams Graphs Binary Decision Diagrams (BDDs) are a class of graphs that can be used as data structure for compactly representing boolean functions. BDDs were introduced by R. Bryant

More information

Scalar and fuzzy cardinalities of crisp and fuzzy multisets

Scalar and fuzzy cardinalities of crisp and fuzzy multisets Scalar and fuzzy cardinalities of crisp and fuzzy multisets Jaume Casasnovas, Francesc Rosselló Department of Mathematics and Computer Science, University of the Balearic Islands, 07122 Palma de Mallorca

More information

The Subtree Size Profile of Plane-oriented Recursive Trees

The Subtree Size Profile of Plane-oriented Recursive Trees The Subtree Size Profile of Plane-oriented Recursive Trees Michael Fuchs Department of Applied Mathematics National Chiao Tung University Hsinchu, Taiwan ANALCO11, January 22nd, 2011 Michael Fuchs (NCTU)

More information

The asymptotic number of prefix normal words

The asymptotic number of prefix normal words The asymptotic number of prefix normal words Paul Balister Stefanie Gerke July 6, 2018 Abstract We show that the number of prefix normal binary words of length n is 2 n Θ((log n)2). We also show that the

More information

On the Average Path Length of Complete m-ary Trees

On the Average Path Length of Complete m-ary Trees 1 2 3 47 6 23 11 Journal of Integer Sequences, Vol. 17 2014, Article 14.6.3 On the Average Path Length of Complete m-ary Trees M. A. Nyblom School of Mathematics and Geospatial Science RMIT University

More information

Reconstruction of certain phylogenetic networks from their tree-average distances

Reconstruction of certain phylogenetic networks from their tree-average distances Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,

More information

Pattern Popularity in 132-Avoiding Permutations

Pattern Popularity in 132-Avoiding Permutations Pattern Popularity in 132-Avoiding Permutations The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Rudolph,

More information

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES MAGNUS BORDEWICH 1, CATHERINE MCCARTIN 2, AND CHARLES SEMPLE 3 Abstract. In this paper, we give a (polynomial-time) 3-approximation

More information

A Polynomial Time Deterministic Algorithm for Identity Testing Read-Once Polynomials

A Polynomial Time Deterministic Algorithm for Identity Testing Read-Once Polynomials A Polynomial Time Deterministic Algorithm for Identity Testing Read-Once Polynomials Daniel Minahan Ilya Volkovich August 15, 2016 Abstract The polynomial identity testing problem, or PIT, asks how we

More information

arxiv: v1 [math.co] 3 Nov 2014

arxiv: v1 [math.co] 3 Nov 2014 SPARSE MATRICES DESCRIBING ITERATIONS OF INTEGER-VALUED FUNCTIONS BERND C. KELLNER arxiv:1411.0590v1 [math.co] 3 Nov 014 Abstract. We consider iterations of integer-valued functions φ, which have no fixed

More information

Phylogenetic Algebraic Geometry

Phylogenetic Algebraic Geometry Phylogenetic Algebraic Geometry Seth Sullivant North Carolina State University January 4, 2012 Seth Sullivant (NCSU) Phylogenetic Algebraic Geometry January 4, 2012 1 / 28 Phylogenetics Problem Given a

More information

The Chinese Remainder Theorem

The Chinese Remainder Theorem Chapter 5 The Chinese Remainder Theorem 5.1 Coprime moduli Theorem 5.1. Suppose m, n N, and gcd(m, n) = 1. Given any remainders r mod m and s mod n we can find N such that N r mod m and N s mod n. Moreover,

More information

Binary Decision Diagrams

Binary Decision Diagrams Binary Decision Diagrams Binary Decision Diagrams (BDDs) are a class of graphs that can be used as data structure for compactly representing boolean functions. BDDs were introduced by R. Bryant in 1986.

More information

Worst case analysis for a general class of on-line lot-sizing heuristics

Worst case analysis for a general class of on-line lot-sizing heuristics Worst case analysis for a general class of on-line lot-sizing heuristics Wilco van den Heuvel a, Albert P.M. Wagelmans a a Econometric Institute and Erasmus Research Institute of Management, Erasmus University

More information

The Complexity of Constructing Evolutionary Trees Using Experiments

The Complexity of Constructing Evolutionary Trees Using Experiments The Complexity of Constructing Evolutionary Trees Using Experiments Gerth Stlting Brodal 1,, Rolf Fagerberg 1,, Christian N. S. Pedersen 1,, and Anna Östlin2, 1 BRICS, Department of Computer Science, University

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

The edge-diametric theorem in Hamming spaces

The edge-diametric theorem in Hamming spaces Discrete Applied Mathematics 56 2008 50 57 www.elsevier.com/locate/dam The edge-diametric theorem in Hamming spaces Christian Bey Otto-von-Guericke-Universität Magdeburg, Institut für Algebra und Geometrie,

More information

Spectrally arbitrary star sign patterns

Spectrally arbitrary star sign patterns Linear Algebra and its Applications 400 (2005) 99 119 wwwelseviercom/locate/laa Spectrally arbitrary star sign patterns G MacGillivray, RM Tifenbach, P van den Driessche Department of Mathematics and Statistics,

More information

Distribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers

Distribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers Discrete Mathematics and Theoretical Computer Science DMTCS vol. subm., by the authors, 1 1 Distribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers Christopher Eagle 1 and

More information

A Cubic-Vertex Kernel for Flip Consensus Tree

A Cubic-Vertex Kernel for Flip Consensus Tree To appear in Algorithmica A Cubic-Vertex Kernel for Flip Consensus Tree Christian Komusiewicz Johannes Uhlmann Received: date / Accepted: date Abstract Given a bipartite graph G = (V c, V t, E) and a nonnegative

More information

Reconstructing Trees from Subtree Weights

Reconstructing Trees from Subtree Weights Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree

More information

Acyclic and Oriented Chromatic Numbers of Graphs

Acyclic and Oriented Chromatic Numbers of Graphs Acyclic and Oriented Chromatic Numbers of Graphs A. V. Kostochka Novosibirsk State University 630090, Novosibirsk, Russia X. Zhu Dept. of Applied Mathematics National Sun Yat-Sen University Kaohsiung,

More information

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth Gregory Gutin, Mark Jones, and Magnus Wahlström Royal Holloway, University of London Egham, Surrey TW20 0EX, UK Abstract In the

More information

d(ν) = max{n N : ν dmn p n } N. p d(ν) (ν) = ρ.

d(ν) = max{n N : ν dmn p n } N. p d(ν) (ν) = ρ. 1. Trees; context free grammars. 1.1. Trees. Definition 1.1. By a tree we mean an ordered triple T = (N, ρ, p) (i) N is a finite set; (ii) ρ N ; (iii) p : N {ρ} N ; (iv) if n N + and ν dmn p n then p n

More information

Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel

Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel SUPERTREE ALGORITHMS FOR ANCESTRAL DIVERGENCE DATES AND NESTED TAXA Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel Department of Mathematics and Statistics University of Canterbury

More information

Unavoidable subtrees

Unavoidable subtrees Unavoidable subtrees Maria Axenovich and Georg Osang July 5, 2012 Abstract Let T k be a family of all k-vertex trees. For T T k and a tree T, we write T T if T contains at least one of the trees from T

More information

SOME TRANSFINITE INDUCTION DEDUCTIONS

SOME TRANSFINITE INDUCTION DEDUCTIONS SOME TRANSFINITE INDUCTION DEDUCTIONS SYLVIA DURIAN Abstract. This paper develops the ordinal numbers and transfinite induction, then demonstrates some interesting applications of transfinite induction.

More information

arxiv: v1 [math.ca] 4 Oct 2017

arxiv: v1 [math.ca] 4 Oct 2017 ASYMPTOTICS OF SIGNED BERNOULLI CONVOLUTIONS SCALED BY MULTINACCI NUMBERS arxiv:1710.01780v1 [math.ca] 4 Oct 2017 XIANGHONG CHEN AND TIAN-YOU HU Abstract. We study the signed Bernoulli convolution ( ν

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200 Spring 2018 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley D.D. Ackerly Feb. 26, 2018 Maximum Likelihood Principles, and Applications to

More information

Prime Factorization and Domination in the Hierarchical Product of Graphs

Prime Factorization and Domination in the Hierarchical Product of Graphs Prime Factorization and Domination in the Hierarchical Product of Graphs S. E. Anderson 1, Y. Guo 2, A. Rubin 2 and K. Wash 2 1 Department of Mathematics, University of St. Thomas, St. Paul, MN 55105 2

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

SOLUTIONS OF SEMILINEAR WAVE EQUATION VIA STOCHASTIC CASCADES

SOLUTIONS OF SEMILINEAR WAVE EQUATION VIA STOCHASTIC CASCADES Communications on Stochastic Analysis Vol. 4, No. 3 010) 45-431 Serials Publications www.serialspublications.com SOLUTIONS OF SEMILINEAR WAVE EQUATION VIA STOCHASTIC CASCADES YURI BAKHTIN* AND CARL MUELLER

More information

Analysis of Approximate Quickselect and Related Problems

Analysis of Approximate Quickselect and Related Problems Analysis of Approximate Quickselect and Related Problems Conrado Martínez Univ. Politècnica Catalunya Joint work with A. Panholzer and H. Prodinger LIP6, Paris, April 2009 Introduction Quickselect finds

More information

Graphs with few total dominating sets

Graphs with few total dominating sets Graphs with few total dominating sets Marcin Krzywkowski marcin.krzywkowski@gmail.com Stephan Wagner swagner@sun.ac.za Abstract We give a lower bound for the number of total dominating sets of a graph

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 2. Countable Markov Chains I started Chapter 2 which talks about Markov chains with a countably infinite number of states. I did my favorite example which is on

More information

Definability in the Enumeration Degrees

Definability in the Enumeration Degrees Definability in the Enumeration Degrees Theodore A. Slaman W. Hugh Woodin Abstract We prove that every countable relation on the enumeration degrees, E, is uniformly definable from parameters in E. Consequently,

More information

arxiv: v1 [cs.ds] 1 Nov 2018

arxiv: v1 [cs.ds] 1 Nov 2018 An O(nlogn) time Algorithm for computing the Path-length Distance between Trees arxiv:1811.00619v1 [cs.ds] 1 Nov 2018 David Bryant Celine Scornavacca November 5, 2018 Abstract Tree comparison metrics have

More information

EXACT DOUBLE DOMINATION IN GRAPHS

EXACT DOUBLE DOMINATION IN GRAPHS Discussiones Mathematicae Graph Theory 25 (2005 ) 291 302 EXACT DOUBLE DOMINATION IN GRAPHS Mustapha Chellali Department of Mathematics, University of Blida B.P. 270, Blida, Algeria e-mail: mchellali@hotmail.com

More information

APPLICATION OF HOARE S THEOREM TO SYMMETRIES OF RIEMANN SURFACES

APPLICATION OF HOARE S THEOREM TO SYMMETRIES OF RIEMANN SURFACES Annales Academiæ Scientiarum Fennicæ Series A. I. Mathematica Volumen 18, 1993, 307 3 APPLICATION OF HOARE S THEOREM TO SYMMETRIES OF RIEMANN SURFACES E. Bujalance, A.F. Costa, and D. Singerman Universidad

More information

Homotopy and homology groups of the n-dimensional Hawaiian earring

Homotopy and homology groups of the n-dimensional Hawaiian earring F U N D A M E N T A MATHEMATICAE 165 (2000) Homotopy and homology groups of the n-dimensional Hawaiian earring by Katsuya E d a (Tokyo) and Kazuhiro K a w a m u r a (Tsukuba) Abstract. For the n-dimensional

More information

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Minimum evolution using ordinary least-squares is less robust than neighbor-joining Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees

Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees nnals of Combinatorics 5 (2001) 1-15 0218-0006/01/010001-15$1.50+0.20/0 c Birkhäuser Verlag, Basel, 2001 nnals of Combinatorics Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees

More information

Enumerating split-pair arrangements

Enumerating split-pair arrangements Journal of Combinatorial Theory, Series A 115 (28 293 33 www.elsevier.com/locate/jcta Enumerating split-pair arrangements Ron Graham 1, Nan Zang Department of Computer Science, University of California

More information

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Tao Jiang, Ming Li, Brendan Lucier September 26, 2005 Abstract In this paper we study the Kolmogorov Complexity of a

More information

arxiv: v2 [math.pr] 23 Feb 2017

arxiv: v2 [math.pr] 23 Feb 2017 TWIN PEAKS KRZYSZTOF BURDZY AND SOUMIK PAL arxiv:606.08025v2 [math.pr] 23 Feb 207 Abstract. We study random labelings of graphs conditioned on a small number (typically one or two) peaks, i.e., local maxima.

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

arxiv: v1 [math.pr] 21 Mar 2014

arxiv: v1 [math.pr] 21 Mar 2014 Asymptotic distribution of two-protected nodes in ternary search trees Cecilia Holmgren Svante Janson March 2, 24 arxiv:4.557v [math.pr] 2 Mar 24 Abstract We study protected nodes in m-ary search trees,

More information

Phylogenetic Tree Shape

Phylogenetic Tree Shape Phylogenetic Tree Shape Arne Mooers Simon Fraser University Vancouver, Canada Discussions with generous colleagues & students: Mark Pagel, Sean Nee (Tree Statistics) Mike Steel (Tree shapes) Steve Heard,

More information

Improved maximum parsimony models for phylogenetic networks

Improved maximum parsimony models for phylogenetic networks Improved maximum parsimony models for phylogenetic networks Leo van Iersel Mark Jones Celine Scornavacca December 20, 207 Abstract Phylogenetic networks are well suited to represent evolutionary histories

More information

should be presented and explained in the combined species tree (Fitch, 1970; Goodman et al., 1979). The gene divergence can be the results of either s

should be presented and explained in the combined species tree (Fitch, 1970; Goodman et al., 1979). The gene divergence can be the results of either s On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies Louxin Zhang lxzhang@iss.nus.sg BioInformatics Center Institute of Systems Science Heng Mui Keng Terrace Singapore 119597 Abstract

More information