The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models
|
|
- Harold Jackson
- 6 years ago
- Views:
Transcription
1 The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models Gabriel Cardona, Arnau Mir, Francesc Rosselló Department of Mathematics and Computer Science, University of the Balearic Islands, E-07 Palma de Mallorca, Spain arxiv:30.53v [q-bio.pe] Jan 03 Abstract The cophenetic metrics d ϕ,p, for p {0} [, [, are a recent addition to the kit of available distances for the comparison of phylogenetic trees. Based on a fifty years old idea of Sokal and Rohlf, these metrics compare phylogenetic trees on a same set of taxa by encoding them by means of their vectors of cophenetic values of pairs of taxa and depths of single taxa, and then computing the L p norm of the difference of the corresponding vectors. In this paper we compute the expected value of the square of d ϕ, on the space of fully resolved rooted phylogenetic trees with n leaves, under the Yule and the uniform probability distributions. Keywords: Phylogenetic tree, Cophenetic metric, Uniform model, Yule model, Sackin index, Total cophenetic index. Introduction The definition and study of metrics for the comparison of rooted phylogenetic trees on the same set of taxa is a classical problem in phylogenetics [0, Ch. 30], and many metrics have been introduced so far with this purpose. A recent addition to the set of metrics available in this context are the cophenetic metrics d ϕ,p introduced in [8]. Based on a fifty years old idea of Sokal and Rohlf, these metrics compare phylogenetic trees on a same set of taxa by first encoding the trees by means of their vectors of cophenetic values of pairs of taxa and depths of single taxa, and then computing the L p norm of the difference of the corresponding vectors. Once the disimilarity between two phylogenetic trees has been computed through a given metric, it is convenient in many situations to assess its signifiance. One possibility is to compare the value obtained with its expected, or Corresponding author addresses: gabriel.cardona@uib.es (Gabriel Cardona), arnau.mir@uib.es (Arnau Mir), cesc.rossello@uib.es (Francesc Rosselló) Preprint submitted to Elsevier January 3, 03
2 mean, value: is it much larger, much smaller, similar? [8] This makes it necessary to study the distribution of the metric, or, at least, to have a formula for the expected value of the metric for any number n of leaves. The distribution of several metrics has been studied so far: see, for instance, [5, 6, 6, 7, 8]. The expected value of a distance depends on the probability distribution on the space of phylogenetic trees under consideration. The most popular distribution on the space T n of binary phylogenetic trees with n leaves is the uniform distribution, under which all trees in T n are equiprobable. But phylogeneticists consider also other probability distributions on T n, defined through stochastic models of evolution [0, Ch. 33]. The most popular is the so-called Yule model [4, 9], defined by an evolutionary process where, at each step, each currently extant species can give rise, with the same probability, to two new species. Under this model, different phylogenetic trees with the same number of leaves may have different probabilities, which depend on their shape. In this paper we provide explicit formulas for the expected values under the uniform and the Yule models of the square of the euclidean cophenetic metric d ϕ,. The proofs of these formulas are based on long and tedious algebraic computations and thus, to ease the task of the reader interested only in the formulas and the path leading to them, but not in the details, we have moved these computations to an Appendix at the end of the paper. Besides the aforemenentioned application of this value in the assessment of tree comparisons, the knowledge of formulas for the expected value of d ϕ, under different models may allow the use of d ϕ, to test stochastic models of tree growth, a popular line of research in the last years which so far has been mostly based on shape indices; see, for instance, [3, 9]. As a proof of concept, in 4 we report on a basic, preliminary such test performed on the binary phylogenetic trees contained in the TreeBASE database [0].. Preliminaries In this paper, by a phylogenetic tree on a set S of taxa we mean a fully resolved, or binary, rooted tree with its leaves bijectively labeled in S. We understand such a rooted tree as a directed graph, with its arcs pointing away from the root. To simplify the language, we shall always identify a leaf of a phylogenetic tree with its label. We shall also use the term phylogenetic tree with n leaves to refer to a phylogenetic tree on the set {,..., n}. We shall denote by T (S) the space of all phylogenetic trees on S and by T n the space of all phylogenetic trees with n leaves. Let T be a phylogenetic tree. If there exists a directed path from u to v in T, we shall say that v is a descendant of u and also that u is an ancestor of v. The lowest common ancestor LCA T (u, v) of a pair of nodes u, v in T is the unique common ancestor of them that is a descendant of every other common ancestor of them. The depth δ T (v) of a node v in T is the distance (in number of arcs) from the root of T to v. The cophenetic value ϕ T (i, j) of a pair of leaves i, j in T is the depth of their LCA. To simplify the notations, we shall often write ϕ T (i, i) to denote the depth δ T (i) of a leaf i.
3 Given two phylogenetic trees T, T on disjoint sets of taxa S, S, respectively, we shall denote by T T the phylogenetic tree on S S obtained by connecting the roots of T and T to a (new) common root. Every phylogenetic tree T T n is obtained as T k T n k, for some k n, some subset S k {,..., n} with k elements, some tree T k on S k and some tree T n k on Sc k {,..., n}\s k. Actually, every phylogenetic tree in T n is obtained in this way twice. The Yule, or Equal-Rate Markov, model of evolution [4, 9] is a stochastic model of phylogenetic trees growth. It starts with a node, and at every step a leaf is chosen randomly and uniformly and it is splitted into two leaves. Finally, the labels are assigned randomly and uniformly to the leaves once the desired number of leaves is reached. This corresponds to a model of evolution where, at each step, each currently extant species can give rise, with the same probability, to two new species. Under this stochastic model, if T T n is a phylogenetic tree with set of internal nodes V int (T ), and if for every v V int (T ) we denote by l T (v) the number of its descendant leaves, then the probability of T is [4, 7] P Y (T ) n! v V int(t ) l T (v). The uniform, or Proportional to Distinguishable Arrangements, model [] is another stochastic model of phylogenetic trees growth. Unlike the Yule model, its main feature is that all phylogenetic trees T T n have the same probability: P U (T ), where (n 3)!! (n 3)(n 5) 3. (n 3)!! From the point of view of tree growth, this model is described as the process that starts with a node labeled and then, at the k-th step, a new pendant arc, ending in the leaf labeled k +, is added either to a new root (whose other child will be, then, the original root) or to some edge, with all possible locations of this new pendant arc being equiprobable [9, 6]. Although this is not an explicit model of evolution, only of tree growth, several interpretations of it in terms of evolutionary processes have been given in the literature: see [3, p. 686] and the references therein. 3. Main results Let T T n be a phylogenetic tree with n leaves. The cophenetic vector of T is ϕ(t ) ( ϕ T (i, j) ) i j n Rn(n+)/, with its elements lexicographically ordered in (i, j). It turns out [8] that the mapping ϕ : T n R n(n+)/ sending each T T n to its cophenetic vector ϕ(t ), is injective up to isomorphism. As it is well known, this allows to induce metrics on T n from metrics defined on powers of R. In particular, in this paper 3
4 we consider the cophenetic metric d ϕ, on T n induced by the euclidean distance: d ϕ, (T, T ) (ϕ T (i, j) ϕ T (i, j)). i j n To distinguish it from other cophenetic metrics obtained through other L p normes, we shall call it the euclidean cophenetic metric. Example. Consider the phylogenetic trees T, T T 4 depicted in Fig.. Their total cophenetic vectors are ϕ(t ) (,, 0, 0,, 0, 0,,, ) ϕ(t ) (, 0, 0, 0,,,, 3,, 3) and therefore d ϕ, (T, T ) 7. As we shall see below, the expected values of the square of d ϕ, on T 4 under the uniform and the Yule models are, respectively, 0.56 and 9.4, and hence these two trees are quite more similar than average with respect to the euclidean cophenetic metric under both models. 3 4 T 3 4 T Figure : Two phylogenetic trees with 4 leaves. Let Dn the random variable that chooses a pair of trees T, T T n and computes d ϕ, (T, T ). Its expected values under the Yule and the uniform models are given by the following two theorems. Recall that the n-th harmonic number H n is defined as H n n i /i. Theorem. For every n, the expected value of Dn under the Yule model is E Y (Dn) n ( 3n 0n + 8(n + )H n 4(n + )H ) n. n Theorem 3. For every n, the expected value of Dn under the uniform model is E U (Dn) Å ã 3 (4n3 +8n n(n + 3) (n )!! n(n + 7) (n )!! 0n) (n 3)!! 4 (n 3)!! Since H n ln(n) and (n )!!/(n 3)!! πn, these formulas imply that ( 4 E Y (Dn) 6n, E U (Dn) 3 π ) n 3. 4 We shall prove the formulas in Theorems and 3 by reducing the computation of the expected value of D n to that of the following random variables: 4
5 S n, the random variable that chooses a tree T T n and computes its Sackin index S [3], defined by S(T ) n δ T (i) Φ n, the random variable that chooses a tree T T n and computes its total cophenetic index Φ [8], defined by Φ(T ) ϕ T (i, j) Φ () n i i<j n, the random variable that chooses a tree T T n and computes Φ () (T ) ϕ T (i, j) i j n For the models under consideration, the expected values of these variables are related to that of Dn by the next proposition. In it and henceforth, we shall denote by E(X) the expected value of a random variable X on T n under a generic probability distribution p : T n [0, ] on T n invariant under relabelings. The probability distributions p Y and p U defined by the Yule and the uniform models, respectively, are invariant under relabelings, and therefore the expected values under these specific models, which will be denoted by E Y and E U, respectively, are special cases of E. Proposition 4. E(Dn) E(Φ () n ) E(S n) 4 E(Φ n) n n(n ). Proof. To simplify the notations, let ϕ n be the random variable that chooses a tree T T n and computes ϕ T (, ). δ n be the random variable that chooses a tree T T n and computes δ T (). Let us compute now E(Dn) from its very definition: E(Dn) d ϕ, (T, T ) p(t )p(t ) (T,T ) T n ( (T,T ) T n i j n i j n (T,T ) Tn i j n ( (T,T ) T n (T,T ) T n (ϕ T (i, j) ϕ T (i, j)) )p(t )p(t ) (ϕ T (i, j) + ϕ T (i, j) ϕ T (i, j)ϕ T (i, j))p(t )p(t ) ϕ T (i, j) p(t )p(t ) + ) ϕ T (i, j)ϕ T (i, j)p(t )p(t ) (T,T ) T n ϕ T (i, j) p(t )p(t ) 5
6 ( ϕ T (i, j) p(t ) + ϕ T (i, j) p(t ) i j n( T T n T T )( n )) ϕ T (i, j)p(t ) ϕ T (i, j)p(t ) T T n T T n ( ( ) ) ϕ T (i, j) p(t ) ϕ T (i, j)p(t ) i j n T T n T T n ( ( ϕ T (i, j) )p(t ) ϕ T (i, j)p(t ) T T n i j n i<j n T T n ( ) ϕ T (i, i)p(t ) i n T T n Ç å Φ () n ( ) (T )p(t ) ϕ T (, )p(t ) T T n T T ( n ) n δ T ()p(t ) T T n E(Φ () n ) n(n )E(ϕ n ) ne(δ n ) Now, the values of E(δ n ) and E(ϕ n ) can be easily obtained from E(S n ) and E(Φ n ), respectively, using the invariance under relabelings of the probability distribution under which we compute the expected values E: E(δ n ) E(S n )/n, E(ϕ n ) E(Φ n )/ ( ) n The formula in the statement is then obtained by replacing E(δ n ) and E(ϕ n ) by these values. The expected values of S n and Φ n under the Yule and the uniform models are known: ( (n )!! ) E Y (S n ) n(h n ) E U (S n ) n (n 3)!! E Y (Φ n ) n(n ) n(h n ) E U (Φ n ) Ç å n ((n )!! ) (n 3)!! The formula for E Y (S n ) was proved in [5] and the other three, in [8]. To obtain the expected values of Dn, it remains to compute the expected values of Φ () n. They are given by the following result. Proposition 5. For every n, (a) E Y (Φ () n ) 5n(n ) 8n(H n ) (b) E U (Φ () n ) 6 n(4n + n 7) 3 )!! n(n + 3)(n 4 (n 3)!! This proposition is proved in the Appendix at the end of this paper. Finally, the identities given in Theorems and 3 are obtained by replacing, in the identity ) 6
7 given in Proposition 4, E(S n ), E(Φ n ), and E(Φ () n ) by their values under the Yule and the uniform models, respectively. We leave the last details to the reader. 4. An experiment on TreeBASE In this section we report on a very simple experiment to show how d ϕ, can be used to test evolutionary hypotheses. In this experiment, we have compared the expected value of d ϕ, on T n under the uniform and the Yule models with its average value on the set TreeBASE bin,n of binary phylogenetic trees with n leaves contained in TreeBASE [0]. To perform this experiment, we have taken some decisions. First, since there are only very few values n > 50 such that TreeBASE bin,n > 0, we have decided to consider only those binary trees contained in TreeBASE with n 50 leaves. On the other hand, even for those n such that TreeBASE bin,n is relatively large, in most cases it does not contain many pairs of trees with the same taxa. So, instead of computing the average value of d ϕ, on TreeBASE bin,n by averaging the values d ϕ,(t, T ) for pairs of trees T, T with exactly the same n taxa, we have made use of the formula given in Proposition 4, as if TreeBASE bin,n was closed under relabelings: that is, we have taken only into account the shapes of the trees contained in it. This is consistent with the fact that our final goal is to test models of evolution that produce tree shapes. So, we have computed the average values of Φ (), of the Sackin index S, and of the total cophenetic index Φ on TreeBASE bin,n, and we have taken as average value of d ϕ, on this set the result of appying the formula in Proposition 4. The detailed results of these computations, as well as the Python and R scripts used to compute and analyze them, are available in the Supplementary Material web page Fig. plots the log of these average values as a function of log(n). We have added the curves of the log of the expected values of Dn under the Yule distribution (lower, dotted curve) and under the uniform distribution (upper, dashed curve), again as a function of log(n). The graphic shows that the expected value of d ϕ, on (the shapes of) the phylogenetic trees contained in TreeBASE is better explained by the uniform model than by the Yule model. This agrees with the results of similar experiments using other measures (see, for instance, [3, 8]). 5. Conclusions and discussion In this paper we have obtained formulas for the expected values under the Yule and the uniform models of the square of the euclidean cophenetic metric d ϕ,, defined by the euclidean distance between cophenetic vectors. These formulas are explicit and hold on spaces T n of fully resolved phylogenetic trees with any number n of leaves. 7
8 log of the expected value of the square of the distance number of leaves Figure : Log-log plots of the mean of Dn for the binary trees in TreeBASE with a fixed number n of leaves, of E Y (Dn ) (dotted curve) and E U (Dn ) (dashed curve). These formulas have been obtained through long algebraic manipulations of sums of sequences. To double-check our results, we have computed the exact value of E Y (D n) and E U (D n) for n 3,..., 7, by generating all trees with up to 7 leaves. Moreover, we have computed numerical approximations to these values for n 0, 0,..., 00, by generating pairs of random trees until the numerical method stabilizes. These numerical experiments confirm that our formulas give the right figures. Table gives the exact values for n 3,..., 7. The results of the simulations for n 0, 0,..., 00, as well as the Python scripts used in these computations, are also available in the aforementioned Supplementary Material web page expectedcophdist/ E Y (Dn) E U (Dn) Table : Values of E Y (Dn ) and E U (Dn ) for n 3,..., 7. They agree with those given by our formulas. The formulas for E Y (D n) and E U (D n) grow in different orders: E Y (D n) is in Θ(n ), while E U (D n) is in Θ(n 3 ). Therefore, they can be used to test the Yule and the uniform models as null stochastic models of evolution for collections of phylogenetic trees reconstructed by different methods. We have reported on a first experiment of this type, which reinforces the conclusion that real world phylogenetic trees (that is, those contained in TreeBASE) are not consistent with the Yule model of evolution. We plan to report in a future paper on more 8
9 extensive tests on stochastic models of evolutionary processes, including Ford s α-model [] and Aldous β-model []. Acknowledgements The research reported in this paper has been partially supported by the Spanish government and the UE FEDER program, through projects MTM and TIN E. References [] M. Abramowitz, I. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover (964). [] D. Aldous. Probability distributions on cladograms. Random discrete structures, IMA Vol. Math. Appl. 76 (Springer,996), 8. [3] M. G. B. Blum, O. François, Which random processes describe the Tree of Life? A large-scale study of phylogenetic tree imbalance. Sys. Biol. 55 (006), [4] J. Brown, Probabilities of evolutionary trees. Syst. Biol. 43 (994), [5] D. Bryant, M. Steel, Computing the distribution of a tree metric. IEEE/ACM Trans. Comp. Biol. Bioinf. 6 (009), [6] G. Cardona, A. Mir, F. Rosselló, The expected value under the Yule model of the squared path-difference distance. Appl. Math. Let. 5 (0), [7] G. Cardona, A. Mir, F. Rosselló, Exact formulas for the variance of several balance indices under the Yule model. To appear in J. Math. Bio. http: //dx.doi.org/0.007/s [8] G. Cardona, A. Mir, L. Rotger, F. Rosselló, D. Sánchez, Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf. BMC Bioinformatics (03) 4:3 [9] L. L. Cavalli-Sforza, A. Edwards, Phylogenetic analysis. Models and estimation procedures. Am. J. Hum. Genet., 9 (967), [0] J. Felsenstein, Inferring Phylogenies. Sinauer Associates Inc., 004. [] D. Ford. Probabilities on cladograms: Introduction to the alpha model. arxiv:math/0546 [math.pr] (005). [] Hypergeometric3F/03/07/0. 9
10 [3] HypergeometricF/03/03/0. [4] E. Harding, The probabilities of rooted tree-shapes generated by random bifurcation. Adv. Appl. Prob. 3 (97), [5] S. B. Heard, Patterns in Tree Balance among Cladistic, Phenetic, and Randomly Generated Phylogenetic Trees. Evolution 46 (99), [6] M. Hendy, C. Little, D. Penny, Comparing Trees with Pendant Vertices Labelled. SIAM J. Applied Math. 44 (984), [7] A. Mir, F. Rosselló, The mean value of the squared path-difference distance for rooted phylogenetic trees. J. Math. Anal. Appl. 37 (00), [8] A. Mir, F. Rosselló, L. Rotger, A new balance index for phylogenetic trees. Math. Biosc. 4 (03), [9] A. Mooers, S. B. Heard, Inferring evolutionary process from phylogenetic tree shape. Quart. Rev. Biol. 7 (997), [0] V. Morell, TreeBASE: the roots of phylogeny. Science 73 (996), [] M. Petkovsek, H. Wilf, D. Zeilberger, A B. AK Peters Ltd. (996). Available online at [] D. E. Rosen, Vicariant Patterns and Historical Explanation in Biogeography. Syst. Biol. 7 (978), [3] M. J. Sackin, Good and bad phenograms. Sys. Zool, (97), 5 6. [4] R. Sokal, F. Rohlf, The Comparison of Dendrograms by Objective Methods. Taxon (96), [5] M. Steel, Distribution of the symmetric difference metric on phylogenetic trees. SIAM J. Discr. Math. (988), [6] M. Steel, A. McKenzie, Distributions of cherries for two models of trees. Math. Biosc. 64 (000), 8 9. [7] M. Steel, A. McKenzie, Properties of phylogenetic trees generated by Yuletype speciation models. Math. Biosc. 70 (00), 9. [8] M. A. Steel, D. Penny, Distributions of tree comparison metrics some new results, Syst. Biol. 4 () (993) 6 4. [9] G. U. Yule, A mathematical theory of evolution based on the conclusions of Dr J. C. Willis. Phil. Trans. Royal Soc. (London) Series B 3 (94), 87. 0
11 Appendix: Proof of Proposition 5 Proof of Proposition 5.(a) For every T T n, let Φ(T ) S(T ) + Φ(T ) i j n ϕ T (i, j), and let Φ n be the random variable that chooses a tree T T n and computes Φ(T ). We have that E Y (Φ n ) E Y (S n ) + E Y (Φ n ) n(n ). To compute E Y (Φ () n ), we shall use an argument similar to the one used in the proof of [6, Prop. 3]. Notice that E Y (Φ () n ) Φ () (T ) py (T ) T T n S k {,...,n} S k k T k T (S k ) T n k T (Sc k ) Φ () (Tk T n k) p Y (T k T n k) Now, on the one hand, we have the following easy lemma on P Y (T T ): see [7, Lem. ]. Lemma 6. Let S k {,..., n} with S k k, let T k T (S k ) and T n k T (Sk c ). Then, P Y (T k T n k) (n ) ( n)p (T k )P (T n k). k On the other hand, we have the following recursive expression for Φ () (T T ). Lemma 7. Let S k {,..., n} with S k k, let T k T (S k ) and T n k T (Sk c ). Then Ç å Ç å Φ () (T k T n k) Φ () (T k )+Φ () k + n k + (T n k)+φ(t k )+Φ(T n k)+ +. Proof. Let us assume, without any loss of generality, that S {,..., m} and S {m +,..., n}. Then ϕ Tk (i, j) + if i, j k ϕ Tk T j) ϕ n k(i, T (i, j) + if k + i, j n n k 0 otherwise
12 and therefore Φ () (T k T n k) ϕ Tk T (i, j) n k i j n (ϕ Tk (i, j) + ) + (ϕ T (i, j) + ) n k i j k k+ i j n (ϕ Tk (i, j) + ϕ Tk (i, j) + ) + i j k Ç å k+ i j n Φ () k + (T k ) + Φ(T k ) + + Φ () (T n k) + Φ(T n k) + (ϕ T (i, j) + ϕ n k T (i, j) + ) n k Ç å n k +. So, if we set f(a, b) ( ) ( a+ + b+ ), we have that E Y (Φ () n ) Ç å n k T k T k ] T n k T n k +f(k, n k) (n ) ( n)p Y (T k )P Y (T n k) k [ Φ () (T k )P Y (T k )P Y (T n n k) T k T n k + Φ () (T n k)p Y (T k )P Y (T n k) T k T n k + Φ(T k )P Y (T k )P Y (T n k) T k T n k + Φ(T n k)p Y (T k )P Y (T n k) T k T n k + ] f(k, n k)p Y (T k )P Y (T n k) T k n T n k [ Φ () (T k ) + Φ () (T n k) + (Φ(T k ) + Φ(T n k)) [ Φ () (Tk )P Y (T k ) + Φ () (T n k)p Y (T n k) T k T n k + Φ(T k )P Y (T k ) + Φ(T n k)p Y (T n k) + f(k, n k) T k T n k [ E Y (Φ () k ) + E Y (Φ () n n k) + E Y (Φ k ) + E Y (Φ n k ) Ç å Ç å k + n k + ] + + n E Y (Φ () k ) + 4 n E Y (Φ k ) + n(n + ). 3 ]
13 In particular E Y (Φ ) n n and therefore E Y (Φ () n ) n n + n n E Y (Φ () k ) + 4 n n n n 4 n n E Y (Φ k ) + n(n ). 3 E Y (Φ () k ) + n E Y (Φ () ) E Y (Φ k ) + 4 n E Y (Φ ) + n n n(n ) + n 3 n n E Y (Φ () ) + n E Y (Φ () ) + 4 n E Y (Φ ) + n n n E Y (Φ () ) + 5n 8. Setting x n E Y (Φ () n )/n, this recurrence becomes x n x n and the solution of this recursive equation with x E Y (Φ () ) 0 is x n n k ( 5 8 ) 5(n ) 8(H n ) 5n + 3 8H n k from where we deduce that E Y (Φ () n ) 5n + 3n 8nH n, as we claimed. Proof of Proposition 5.(b) To compute E U (Φ () n ), we shall use an argument similar to the one used in [7]. For every k,..., n, let f k,n d k,n {T T n ϕ T (, ) k} {T T n ϕ T (i, j) k} for every i < j n {T T n δ T () k} {T T n δ T (i) k} for every i n (where X denotes the cardinal of the set X). Lemma 8. For every n, E U (Φ () ( n ) n k d k,n + (n 3)!! Ç å n n k f k,n ) 3
14 Proof. Under the uniform model, E U (Φ () T T n ) n Φ () (T ), (n 3)!! where Φ () (T ) ϕ T (i, j) T T n T T n i j n δ T (i) + ϕ T (i, j) i n T T n i<j n T T n k {T T n δ T (i) k} i n + i n i<j n k d k,n + n k d k,n + i j n n k {T T n ϕ T (i, j) k} Ç n i<j n å n n k f k,n k f k,n. T T n ϕ T (i, j) A formula for d k,n was obtained in the proof of [8, Lem. ]: (n k 3)! k d k,n. () (n k )!n k As far as f k,n goes, we have the following result. In it, and henceforth, p F q denotes the (generalized) hypergeometric function defined by Å ã a,..., a p pf q b,..., b ; z (a ) k (a p ) k zk q (b ) k (b q ) k k!, k 0 where (a) 0 and (a) k : a (a + ) (a + k ) for k. Lemma 9. For every n, f 0,n (n 4)!! and Å ã (n k 5)!k, n, k + n f k,n (n k 4)!! 3F k+5 k n, n + 3 ; for every k,..., n. Proof. Let us start by proving f 0,n (n 4)!! by induction on n. It is clear that f 0, ( 4)!!. Assume now that f 0, ((n ) 4)!!. Every phylogenetic tree T with n leaves such that ϕ T (, ) 0, that is, where LCA T (, ) is the root, is obtained by taking a phylogenetic tree T with n 4
15 leaves such that ϕ T (, ) 0 and adding a new pendant edge, ending in the leaf n, to any edge in T. Then, since there are f 0, (n 6)!! trees T T such that ϕ T (, ) 0, and each one of them has (n ) edges where we can add the new edge, we obtain f 0,n (n 4)(n 6)!! (n 4)!!. Now, to compute f k,n for k, we shall study the structure of a tree T T n such that ϕ T (, ) k; to simplify the notations, let us denote by x the node LCA T (, ), which has depth k, and by T 0 the subtree of T rooted at x. Then, on the one hand, T 0 is a phylogenetic tree on a subset S 0 {,..., n} containing,, and since its root x is the LCA of and in T, we have that ϕ T0 (, ) 0. On the other hand, there is a path (r v, v, v 3,..., v k+ x) in T from r to x. For every j,..., k, let T j be the subtree rooted at the child of v j other than v j+ ; see Fig. 3. So, the tree T is determined by: A number 0 m n k, so that m + will be the number of leaves of the phylogenetic tree T 0 rooted at LCA T (, ) A subset {i,..., i m } of {3,..., n}. There are ( ) n m such subsets. A phylogenetic tree T 0 on {,, i,..., i m } such that ϕ T0 (, ) 0. There are f 0,m+ (m)!! such trees. An ordered k-forest, that is, an ordered sequence of phylogenetic trees (T, T,..., T k ) such that k i L(T i) {,..., n} {,, i,..., i m }. The number of such ordered k-forests is (see, for instance, [7, Lem. ]) (n m k 5)!k (n m k )! n m k.... T T x T 0 T k Figure 3: The structure of a tree T with ϕ T (, ) k. 5
16 This shows that f k,n can be computed as f k,n k n k m0 n k m0 n k m0 (n )!k n k (number of ways of choosing {i,..., i m }) (number of trees in T m+ with ϕ T (, ) 0) (number of ordered k-forests on n m leaves) Ç å n (m)!! m (n m k 5)!k (n m k )! n m k (n )!m! m (n m k 5)! m!(n m )!(n m k )! n m k n k m0 Now, taking into account that () m m! 4 m (n m k 5)! (n m )!(n m k )! m (n )! ( n) m ( ) (n m )! m (n k )! (k + n) m ( ) Å ã k + 5 n Å ã k n + 3 we have that Å ã, n, k + n 3F k+5 k n, n + 3 ; m 0 m 0 n k m0 m m (n k m )! ( )m (n k 5)!! m (n k m 5)!!, ( )m (n k 6)!! m (n k m 6)!! () m ( n) m (k + n) m ( k+5 n) m ( k n + 3) m m! m!(n )!(n k )! m (n k m 5)!! m (n k m 6)!! (n m )!(n k m )!(n k 5)!!(n k 6)!!m! (n )!(n k )!(n k m 5)! m (n m )!(n k m )!(n k 5)! (n )!(n k )! (n k 5)! from where we deduce that n k m0 n k m0 (n k m 5)!4 m (n m )!(n k m )! (n k m 5)!4 m (n m )!(n k m )! Å ã (n k 5)!, n, k + n (n )!(n k )! 3 F k+5 k n, n + 3 ; 6
17 and hence f k,n as we claimed. n k (n )!k 4 m (n m k 5)! n k (n m )!(n m k )! m0 Å ã (n )!k n k (n k 5)!, n, k + n (n )!(n k )! 3 F k+5 k n, n + 3 ; Å ã (n k 5)!k, n, k + n (n k 4)!! 3F k+5 k n, n + 3 ; We must compute now the sums k d k,n, n k f k,n. To do that, we shall use the following auxiliary lemma. Lemma 0. For every n and m, let Then, U n,m n k0 k m (n + k )! k! k. U n,0 (n 4)!! U n, (n )(n 4)!! (n 3)!! U n, (n )(n 4)!! (n )(n 3)!! U n,3 (n 3 + 3n 3n )(n 4)!! (3n + n )(n 3)!! Proof. The proof of these identities is standard, using well known equalities for hypergeometric functions and the lookup algorithm given in [, p. 36]. We shall prove in detail the identity for m, and we leave the details of the rest to the reader. Notice that U n, n k0 k0 k n (n + k )! k (n + k )! k! k k! k (k + ) (n + k )! (k + )! k+ kn n 3 k0 (k + ) (n + k )! (k + )! k+ (k + ) (n + k )! (k + )! k+ Set X n k0 (k + ) (n + k )! (k + )! k+, Y n kn (k + ) (n + k )! (k + )! k+ We compute now these two summands. 7
18 As to X n, If we set we have that X n (n )! k0 (k + ) (n + k )! (n )!(k + )! k t k (k + ) (n + k )! (n )!(k + )! k, t k+ (k + )(k + n) t k (k + ) and therefore, by the lookup algorithm [, p. 36], we have that Å (n )!, n X n F ; ã Å ã (n )! n n, F ; (using (5.3.4) in [, p. 559]) (n )! (n) k ( ) k ( )k () k k! k 0 ( (n )! (n)0 ( ) 0 ( )0 + (n) ) ( ) ( ) () 0 0! ()! (n )! (n + ) As to Y n, If we take now we have that (k + n ) (n + k 3)! Y n (k + n )! k+ k0 (n ) (n 3)! (k + n ) (n + k 3)! (n )! k0 (k + n )! k () (n 3)! ()! t k (k + n ) (n + k 3)! (k + n )! k () (n 3)! ()! t k+ (n + k)(n + k ) t k (k + n ) and therefore, again by the lookup algorithm [, p. 36], we have that Y n (n Å ) (n 3)!, n, n (n )! 3F n, n ; (n ) (n 3)! [ Å n, (n )! F n ; ã ã + n F Å n, n ; ã] (using []). 8
19 Now Å n, F n ; ã Å n, F n (n ) Γ(n ) [ Γ(n ) Γ(n ) Γ(0) (n ) (n )! [ (n )! + (n 3)! (n )! ã ; + Γ(n) Γ() + Γ(n ) (n 3)!! ] (using (5.3.4) in [, p. 559]) ] Γ( ) (using [3]) + (n 3)!! Å n, F n ; ã Å ã, n F n ; (using (5.3.4) in [, p. 559]) Γ(n) ( Γ(n 4 ) ( n) Γ(n ) Γ( ) + Γ(n + ) ) Γ( 3 ) + Γ(n) (using [3]) n (n )! ( (n 3)!! (n )!! ) (n )! + + (n )! Therefore, (n )! ((n 3)!! + (n )!! + n (n )!) (n )! Y n (n ) (n 3)! [ (n )! (n )! + (n 3)!! + n (n )! ] ((n 3)!! + (n )!! + n (n )!) (n )! n (n + )(n )! + (n )!! and finally as we claimed. U n, X n Y n n (n + )(n )! (n )!! (n )(n 4)!! (n )(n 3)!! Lemma. For every n, k d k,n (4n )(n 3)!! 3(n )!!. Proof. By equation (), k d k,n k 3 n (n k 3)! (n k )! n k k0 (n k ) 3 (n + k )! k! k (n ) 3 U n,0 3(n ) U n, + 3(n )U n, U n,3 (n ) 3 (n 4)!! 3(n ) ( (n )(n 4)!! (n 3)!! ) +3(n ) ( (n )(n 4)!! (n )(n 3)!! ) ( (n 3 + 3n 3n )(n 4)!! (3n + n )(n 3)!! ) (4n )(n 3)!! 3(n )(n 4)!!. 9
20 Lemma. For every n, n k f k,n 3 (4n + )(n 3)!! 3 (n )!!. Proof. To simplify the notations, set S n n proof of Lemma 9, and therefore S n Set now S n f k,n (n )! n (n )!k n k n n n k m0 n k k k 3 m0 k f k,n. As we have seen in the 4 m (n m k 5)! (n m )!(n m k )! 4 m (n k m 5)! (n k )!(n k m )! (n )! n k n k k 3 4 n k m (k + m )! (k + m)!m! m0 ( n (n )! n k 3 n k Ç å) k k + k + m 4 m m k + m ( m (n )! n 6 n n + n + k 3 n k Ç å) k + m k 4 m m k + m m n k 3 n k k m 4 m m Since S 3 0, we have that Ç å k + m k + m n 3 S n (S p+ S p) p3 k 3 n k k m 4 m m Ç å k + m k + m and S p+ S p p 3 Ç (p )3 k 3 p k 3 p + k (p k )4 p k p (p )3 p + p (p )3 p + (p )3 p + p 3 p (p )! p (p )! å k 3 (p k 3)! k (p k )(p )!(p k )! p 3 p 3 k 3 (p k 3)! k (p k )! (p k ) 3 (p + k )! k p+ (k + )! 0
21 (p )3 p + (p )3 p + p (p )! p k [ p (p k ) 3 (p + k )! k k! (p k ) 3 (p + k )! p (p )! k k! k0 (p ) 3 (p )! ] (p )3 (p )! p (p ) (p k ) 3 (p + k )! p + p (p )! k k! k0 (p ) ( ) p + (4p )(p 3)!! 3(p )!! (p )!! (p ) (p 3)!! p + (4p ) (p )!! 3 Therefore ( S n (p 3)!! (4p ) (p )!! p3 (p ) ) p 3 (by Lemma ) Now, applying Gosper s algorithm [, p. 77] we have that (p 3)!! (4p ) (p )!! ( Ç å n 3 ) 3 n+ 3(4n 3n ) 39 n n p3 and then S n ( Ç å n 3 ) 3 n+ 3(4n 3n ) 39 n n n 8(n + ) n+ 3(n 3) n + (4n + )(n 3)!! 3(n + ) +. n 3(n 4)!! Finally, S n Å ã (n )! n 6 n + n + S n 3(n )! n (4n + )(n 3)!! (4n + )(n 3)!! 3 (n )!!.
22 Summarizing, by Lemmas 8,, and, we have that as we claimed. ( Ç å n n ) n k d k,n + k f k,n (n 3)!! [ n((4n )(n 3)!! 3(n )!!) (n 3)!! Ç å n ( + 3 (4n + )(n 3)!! 3 )] (n )!! E U (Φ () n ) 6 n(4n + n 7) 3n(n + 3) 4 (n )!! (n 3)!!
arxiv: v1 [q-bio.pe] 13 Aug 2015
Noname manuscript No. (will be inserted by the editor) On joint subtree distributions under two evolutionary models Taoyang Wu Kwok Pui Choi arxiv:1508.03139v1 [q-bio.pe] 13 Aug 2015 August 14, 2015 Abstract
More informationProbabilities of Evolutionary Trees under a Rate-Varying Model of Speciation
Probabilities of Evolutionary Trees under a Rate-Varying Model of Speciation Mike Steel Biomathematics Research Centre University of Canterbury, Private Bag 4800 Christchurch, New Zealand No. 67 December,
More informationOn joint subtree distributions under two evolutionary models
Noname manuscript No. (will be inserted by the editor) On joint subtree distributions under two evolutionary models Taoyang Wu Kwok Pui Choi November 5, 2015 Abstract In population and evolutionary biology,
More informationDISTRIBUTIONS OF CHERRIES FOR TWO MODELS OF TREES
DISTRIBUTIONS OF CHERRIES FOR TWO MODELS OF TREES ANDY McKENZIE and MIKE STEEL Biomathematics Research Centre University of Canterbury Private Bag 4800 Christchurch, New Zealand No. 177 May, 1999 Distributions
More informationAn Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees
An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute
More informationProbabilities on cladograms: introduction to the alpha model
arxiv:math/0511246v1 [math.pr] 9 Nov 2005 Probabilities on cladograms: introduction to the alpha model Daniel J. Ford Abstract This report introduces the alpha model. The alpha model is a one parameter
More informationarxiv: v1 [q-bio.pe] 1 Jun 2014
THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from
More informationApplications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee)
Applications of Analytic Combinatorics in Mathematical Biology (joint with H. Chang, M. Drmota, E. Y. Jin, and Y.-W. Lee) Michael Fuchs Department of Applied Mathematics National Chiao Tung University
More informationMaximum Agreement Subtrees
Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 1 / 23 Phylogenetics Problem Given a collection
More informationOn the Subnet Prune and Regraft distance
On the Subnet Prune and Regraft distance Jonathan Klawitter and Simone Linz Department of Computer Science, University of Auckland, New Zealand jo. klawitter@ gmail. com, s. linz@ auckland. ac. nz arxiv:805.07839v
More informationA CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES
A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES SIMONE LINZ AND CHARLES SEMPLE Abstract. Calculating the rooted subtree prune and regraft (rspr) distance between two rooted binary
More informationarxiv: v1 [cs.cc] 9 Oct 2014
Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz May 7, 08 arxiv:40.7v [cs.cc] 9 Oct 04 Abstract A ternary
More informationCherry picking: a characterization of the temporal hybridization number for a set of phylogenies
Bulletin of Mathematical Biology manuscript No. (will be inserted by the editor) Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies Peter J. Humphries Simone
More informationA concise proof of Kruskal s theorem on tensor decomposition
A concise proof of Kruskal s theorem on tensor decomposition John A. Rhodes 1 Department of Mathematics and Statistics University of Alaska Fairbanks PO Box 756660 Fairbanks, AK 99775 Abstract A theorem
More informationNOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS
NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS PETER J. HUMPHRIES AND CHARLES SEMPLE Abstract. For two rooted phylogenetic trees T and T, the rooted subtree prune and regraft distance
More informationPhylogenetic Networks, Trees, and Clusters
Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University
More informationarxiv: v1 [math.ra] 13 Jan 2009
A CONCISE PROOF OF KRUSKAL S THEOREM ON TENSOR DECOMPOSITION arxiv:0901.1796v1 [math.ra] 13 Jan 2009 JOHN A. RHODES Abstract. A theorem of J. Kruskal from 1977, motivated by a latent-class statistical
More informationTree Models for Macro-Evolution and Phylogenetic Analysis
Tree Models for Macro-Evolution and Phylogenetic Analysis Graham Jones 5th February 2009 email: art@gjones.name web site: www.indriid.com Abstract It has long been recognized that phylogenetic trees are
More informationarxiv: v1 [math.pr] 12 Jan 2017
Central limit theorem for the Horton-Strahler bifurcation ratio of general branch order Ken Yamamoto Department of Physics and arth Sciences, Faculty of Science, University of the Ryukyus, 1 Sembaru, Nishihara,
More informationTHE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES
Syst. Biol. 45(3):33-334, 1996 THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES DOUGLAS E. CRITCHLOW, DENNIS K. PEARL, AND CHUNLIN QIAN Department of Statistics, Ohio State University, Columbus,
More informationThe Moran Process as a Markov Chain on Leaf-labeled Trees
The Moran Process as a Markov Chain on Leaf-labeled Trees David J. Aldous University of California Department of Statistics 367 Evans Hall # 3860 Berkeley CA 94720-3860 aldous@stat.berkeley.edu http://www.stat.berkeley.edu/users/aldous
More informationSolving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.
Solving the Maximum Agreement Subtree and Maximum Compatible Tree problems on bounded degree trees LIRMM, Montpellier France 4th July 2006 Introduction The Mast and Mct problems: given a set of evolutionary
More informationPLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by:[ujf - INP Grenoble SICD 1] [UJF - INP Grenoble SICD 1] On: 6 June 2007 Access Details: [subscription number 772812057] Publisher: Taylor & Francis Informa Ltd Registered
More informationTHE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT
COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.
More informationProperties of normal phylogenetic networks
Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is
More informationGreedy Trees, Caterpillars, and Wiener-Type Graph Invariants
Georgia Southern University Digital Commons@Georgia Southern Mathematical Sciences Faculty Publications Mathematical Sciences, Department of 2012 Greedy Trees, Caterpillars, and Wiener-Type Graph Invariants
More informationOn the mean connected induced subgraph order of cographs
AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 71(1) (018), Pages 161 183 On the mean connected induced subgraph order of cographs Matthew E Kroeker Lucas Mol Ortrud R Oellermann University of Winnipeg Winnipeg,
More informationGap Embedding for Well-Quasi-Orderings 1
WoLLIC 2003 Preliminary Version Gap Embedding for Well-Quasi-Orderings 1 Nachum Dershowitz 2 and Iddo Tzameret 3 School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel Abstract Given a
More informationA Geometric Approach to Tree Shape Statistics
Syst. Biol. 55(4):652 661, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150600889617 A Geometric Approach to Tree Shape Statistics FREDERICK
More informationHistorical Biogeography. Historical Biogeography. Systematics
Historical Biogeography I. Definitions II. Fossils: problems with fossil record why fossils are important III. Phylogeny IV. Phenetics VI. Phylogenetic Classification Disjunctions debunked: Examples VII.
More informationLet S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT
More informationOn Symmetries of Non-Plane Trees in a Non-Uniform Model
On Symmetries of Non-Plane Trees in a Non-Uniform Model Jacek Cichoń Abram Magner Wojciech Spankowski Krystof Turowski October 3, 26 Abstract Binary trees come in two varieties: plane trees, often simply
More informationMAXIMAL CLADES IN RANDOM BINARY SEARCH TREES
MAXIMAL CLADES IN RANDOM BINARY SEARCH TREES SVANTE JANSON Abstract. We study maximal clades in random phylogenetic trees with the Yule Harding model or, equivalently, in binary search trees. We use probabilistic
More informationRECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION
RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationOn improving matchings in trees, via bounded-length augmentations 1
On improving matchings in trees, via bounded-length augmentations 1 Julien Bensmail a, Valentin Garnero a, Nicolas Nisse a a Université Côte d Azur, CNRS, Inria, I3S, France Abstract Due to a classical
More informationarxiv: v1 [q-bio.pe] 3 May 2016
PHYLOGENETIC TREES AND EUCLIDEAN EMBEDDINGS MARK LAYER AND JOHN A. RHODES arxiv:1605.01039v1 [q-bio.pe] 3 May 2016 Abstract. It was recently observed by de Vienne et al. that a simple square root transformation
More informationarxiv:math/ v2 [math.qa] 30 Mar 2005
THE OPERAD QUAD IS KOSZUL JON EIVIND VATNE arxiv:math/04580v2 [math.qa] 30 Mar 2005 Abstract. The purpose of this paper is to prove the koszulity of the operad Quad, governing quadri-algebras. That Quad
More informationNon-independence in Statistical Tests for Discrete Cross-species Data
J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,
More informationarxiv: v1 [math.co] 25 Jun 2014
THE NON-PURE VERSION OF THE SIMPLEX AND THE BOUNDARY OF THE SIMPLEX NICOLÁS A. CAPITELLI arxiv:1406.6434v1 [math.co] 25 Jun 2014 Abstract. We introduce the non-pure versions of simplicial balls and spheres
More informationA Phylogenetic Network Construction due to Constrained Recombination
A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer
More informationCoding and Counting Arrangements of Pseudolines
Coding and Counting Arrangements of Pseudolines Stefan Felsner Institut für Mathematik Technische Universität Berlin Strasse des 17. Juni 136 D-1063 Berlin, Germany Pavel Valtr Department of Applied Mathematics
More informationChapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1.
Chapter 11 Min Cut By Sariel Har-Peled, December 10, 013 1 Version: 1.0 I built on the sand And it tumbled down, I built on a rock And it tumbled down. Now when I build, I shall begin With the smoke from
More informationThe maximum agreement subtree problem
The maximum agreement subtree problem arxiv:1201.5168v5 [math.co] 20 Feb 2013 Daniel M. Martin Centro de Matemática, Computação e Cognição Universidade Federal do ABC, Santo André, SP 09210-170 Brasil.
More informationk-protected VERTICES IN BINARY SEARCH TREES
k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from
More informationMath 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.
Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 14 March 11 Lecturer: Lior Pachter Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar 14.1 Introduction The goal of today s lecture
More informationDNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi
DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :
More informationBounded Treewidth Graphs A Survey German Russian Winter School St. Petersburg, Russia
Bounded Treewidth Graphs A Survey German Russian Winter School St. Petersburg, Russia Andreas Krause krausea@cs.tum.edu Technical University of Munich February 12, 2003 This survey gives an introduction
More informationDesign and Analysis of Algorithms
CSE 101, Winter 2018 Design and Analysis of Algorithms Lecture 5: Divide and Conquer (Part 2) Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ A Lower Bound on Convex Hull Lecture 4 Task: sort the
More informationBinary Decision Diagrams. Graphs. Boolean Functions
Binary Decision Diagrams Graphs Binary Decision Diagrams (BDDs) are a class of graphs that can be used as data structure for compactly representing boolean functions. BDDs were introduced by R. Bryant
More informationScalar and fuzzy cardinalities of crisp and fuzzy multisets
Scalar and fuzzy cardinalities of crisp and fuzzy multisets Jaume Casasnovas, Francesc Rosselló Department of Mathematics and Computer Science, University of the Balearic Islands, 07122 Palma de Mallorca
More informationThe Subtree Size Profile of Plane-oriented Recursive Trees
The Subtree Size Profile of Plane-oriented Recursive Trees Michael Fuchs Department of Applied Mathematics National Chiao Tung University Hsinchu, Taiwan ANALCO11, January 22nd, 2011 Michael Fuchs (NCTU)
More informationThe asymptotic number of prefix normal words
The asymptotic number of prefix normal words Paul Balister Stefanie Gerke July 6, 2018 Abstract We show that the number of prefix normal binary words of length n is 2 n Θ((log n)2). We also show that the
More informationOn the Average Path Length of Complete m-ary Trees
1 2 3 47 6 23 11 Journal of Integer Sequences, Vol. 17 2014, Article 14.6.3 On the Average Path Length of Complete m-ary Trees M. A. Nyblom School of Mathematics and Geospatial Science RMIT University
More informationReconstruction of certain phylogenetic networks from their tree-average distances
Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,
More informationPattern Popularity in 132-Avoiding Permutations
Pattern Popularity in 132-Avoiding Permutations The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Rudolph,
More informationA 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction
A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES MAGNUS BORDEWICH 1, CATHERINE MCCARTIN 2, AND CHARLES SEMPLE 3 Abstract. In this paper, we give a (polynomial-time) 3-approximation
More informationA Polynomial Time Deterministic Algorithm for Identity Testing Read-Once Polynomials
A Polynomial Time Deterministic Algorithm for Identity Testing Read-Once Polynomials Daniel Minahan Ilya Volkovich August 15, 2016 Abstract The polynomial identity testing problem, or PIT, asks how we
More informationarxiv: v1 [math.co] 3 Nov 2014
SPARSE MATRICES DESCRIBING ITERATIONS OF INTEGER-VALUED FUNCTIONS BERND C. KELLNER arxiv:1411.0590v1 [math.co] 3 Nov 014 Abstract. We consider iterations of integer-valued functions φ, which have no fixed
More informationPhylogenetic Algebraic Geometry
Phylogenetic Algebraic Geometry Seth Sullivant North Carolina State University January 4, 2012 Seth Sullivant (NCSU) Phylogenetic Algebraic Geometry January 4, 2012 1 / 28 Phylogenetics Problem Given a
More informationThe Chinese Remainder Theorem
Chapter 5 The Chinese Remainder Theorem 5.1 Coprime moduli Theorem 5.1. Suppose m, n N, and gcd(m, n) = 1. Given any remainders r mod m and s mod n we can find N such that N r mod m and N s mod n. Moreover,
More informationBinary Decision Diagrams
Binary Decision Diagrams Binary Decision Diagrams (BDDs) are a class of graphs that can be used as data structure for compactly representing boolean functions. BDDs were introduced by R. Bryant in 1986.
More informationWorst case analysis for a general class of on-line lot-sizing heuristics
Worst case analysis for a general class of on-line lot-sizing heuristics Wilco van den Heuvel a, Albert P.M. Wagelmans a a Econometric Institute and Erasmus Research Institute of Management, Erasmus University
More informationThe Complexity of Constructing Evolutionary Trees Using Experiments
The Complexity of Constructing Evolutionary Trees Using Experiments Gerth Stlting Brodal 1,, Rolf Fagerberg 1,, Christian N. S. Pedersen 1,, and Anna Östlin2, 1 BRICS, Department of Computer Science, University
More informationCS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003
CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang
More informationThe edge-diametric theorem in Hamming spaces
Discrete Applied Mathematics 56 2008 50 57 www.elsevier.com/locate/dam The edge-diametric theorem in Hamming spaces Christian Bey Otto-von-Guericke-Universität Magdeburg, Institut für Algebra und Geometrie,
More informationSpectrally arbitrary star sign patterns
Linear Algebra and its Applications 400 (2005) 99 119 wwwelseviercom/locate/laa Spectrally arbitrary star sign patterns G MacGillivray, RM Tifenbach, P van den Driessche Department of Mathematics and Statistics,
More informationDistribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers
Discrete Mathematics and Theoretical Computer Science DMTCS vol. subm., by the authors, 1 1 Distribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers Christopher Eagle 1 and
More informationA Cubic-Vertex Kernel for Flip Consensus Tree
To appear in Algorithmica A Cubic-Vertex Kernel for Flip Consensus Tree Christian Komusiewicz Johannes Uhlmann Received: date / Accepted: date Abstract Given a bipartite graph G = (V c, V t, E) and a nonnegative
More informationReconstructing Trees from Subtree Weights
Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree
More informationAcyclic and Oriented Chromatic Numbers of Graphs
Acyclic and Oriented Chromatic Numbers of Graphs A. V. Kostochka Novosibirsk State University 630090, Novosibirsk, Russia X. Zhu Dept. of Applied Mathematics National Sun Yat-Sen University Kaohsiung,
More informationThe Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth
The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth Gregory Gutin, Mark Jones, and Magnus Wahlström Royal Holloway, University of London Egham, Surrey TW20 0EX, UK Abstract In the
More informationd(ν) = max{n N : ν dmn p n } N. p d(ν) (ν) = ρ.
1. Trees; context free grammars. 1.1. Trees. Definition 1.1. By a tree we mean an ordered triple T = (N, ρ, p) (i) N is a finite set; (ii) ρ N ; (iii) p : N {ρ} N ; (iv) if n N + and ν dmn p n then p n
More informationCharles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel
SUPERTREE ALGORITHMS FOR ANCESTRAL DIVERGENCE DATES AND NESTED TAXA Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel Department of Mathematics and Statistics University of Canterbury
More informationUnavoidable subtrees
Unavoidable subtrees Maria Axenovich and Georg Osang July 5, 2012 Abstract Let T k be a family of all k-vertex trees. For T T k and a tree T, we write T T if T contains at least one of the trees from T
More informationSOME TRANSFINITE INDUCTION DEDUCTIONS
SOME TRANSFINITE INDUCTION DEDUCTIONS SYLVIA DURIAN Abstract. This paper develops the ordinal numbers and transfinite induction, then demonstrates some interesting applications of transfinite induction.
More informationarxiv: v1 [math.ca] 4 Oct 2017
ASYMPTOTICS OF SIGNED BERNOULLI CONVOLUTIONS SCALED BY MULTINACCI NUMBERS arxiv:1710.01780v1 [math.ca] 4 Oct 2017 XIANGHONG CHEN AND TIAN-YOU HU Abstract. We study the signed Bernoulli convolution ( ν
More information"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley D.D. Ackerly Feb. 26, 2018 Maximum Likelihood Principles, and Applications to
More informationPrime Factorization and Domination in the Hierarchical Product of Graphs
Prime Factorization and Domination in the Hierarchical Product of Graphs S. E. Anderson 1, Y. Guo 2, A. Rubin 2 and K. Wash 2 1 Department of Mathematics, University of St. Thomas, St. Paul, MN 55105 2
More informationNJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees
NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana
More informationSOLUTIONS OF SEMILINEAR WAVE EQUATION VIA STOCHASTIC CASCADES
Communications on Stochastic Analysis Vol. 4, No. 3 010) 45-431 Serials Publications www.serialspublications.com SOLUTIONS OF SEMILINEAR WAVE EQUATION VIA STOCHASTIC CASCADES YURI BAKHTIN* AND CARL MUELLER
More informationAnalysis of Approximate Quickselect and Related Problems
Analysis of Approximate Quickselect and Related Problems Conrado Martínez Univ. Politècnica Catalunya Joint work with A. Panholzer and H. Prodinger LIP6, Paris, April 2009 Introduction Quickselect finds
More informationGraphs with few total dominating sets
Graphs with few total dominating sets Marcin Krzywkowski marcin.krzywkowski@gmail.com Stephan Wagner swagner@sun.ac.za Abstract We give a lower bound for the number of total dominating sets of a graph
More informationMATH 56A: STOCHASTIC PROCESSES CHAPTER 2
MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 2. Countable Markov Chains I started Chapter 2 which talks about Markov chains with a countably infinite number of states. I did my favorite example which is on
More informationDefinability in the Enumeration Degrees
Definability in the Enumeration Degrees Theodore A. Slaman W. Hugh Woodin Abstract We prove that every countable relation on the enumeration degrees, E, is uniformly definable from parameters in E. Consequently,
More informationarxiv: v1 [cs.ds] 1 Nov 2018
An O(nlogn) time Algorithm for computing the Path-length Distance between Trees arxiv:1811.00619v1 [cs.ds] 1 Nov 2018 David Bryant Celine Scornavacca November 5, 2018 Abstract Tree comparison metrics have
More informationEXACT DOUBLE DOMINATION IN GRAPHS
Discussiones Mathematicae Graph Theory 25 (2005 ) 291 302 EXACT DOUBLE DOMINATION IN GRAPHS Mustapha Chellali Department of Mathematics, University of Blida B.P. 270, Blida, Algeria e-mail: mchellali@hotmail.com
More informationAPPLICATION OF HOARE S THEOREM TO SYMMETRIES OF RIEMANN SURFACES
Annales Academiæ Scientiarum Fennicæ Series A. I. Mathematica Volumen 18, 1993, 307 3 APPLICATION OF HOARE S THEOREM TO SYMMETRIES OF RIEMANN SURFACES E. Bujalance, A.F. Costa, and D. Singerman Universidad
More informationHomotopy and homology groups of the n-dimensional Hawaiian earring
F U N D A M E N T A MATHEMATICAE 165 (2000) Homotopy and homology groups of the n-dimensional Hawaiian earring by Katsuya E d a (Tokyo) and Kazuhiro K a w a m u r a (Tsukuba) Abstract. For the n-dimensional
More informationMinimum evolution using ordinary least-squares is less robust than neighbor-joining
Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationSubtree Transfer Operations and Their Induced Metrics on Evolutionary Trees
nnals of Combinatorics 5 (2001) 1-15 0218-0006/01/010001-15$1.50+0.20/0 c Birkhäuser Verlag, Basel, 2001 nnals of Combinatorics Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees
More informationEnumerating split-pair arrangements
Journal of Combinatorial Theory, Series A 115 (28 293 33 www.elsevier.com/locate/jcta Enumerating split-pair arrangements Ron Graham 1, Nan Zang Department of Computer Science, University of California
More informationAverage Case Analysis of QuickSort and Insertion Tree Height using Incompressibility
Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Tao Jiang, Ming Li, Brendan Lucier September 26, 2005 Abstract In this paper we study the Kolmogorov Complexity of a
More informationarxiv: v2 [math.pr] 23 Feb 2017
TWIN PEAKS KRZYSZTOF BURDZY AND SOUMIK PAL arxiv:606.08025v2 [math.pr] 23 Feb 207 Abstract. We study random labelings of graphs conditioned on a small number (typically one or two) peaks, i.e., local maxima.
More informationAlgorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,
Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837
More informationarxiv: v1 [math.pr] 21 Mar 2014
Asymptotic distribution of two-protected nodes in ternary search trees Cecilia Holmgren Svante Janson March 2, 24 arxiv:4.557v [math.pr] 2 Mar 24 Abstract We study protected nodes in m-ary search trees,
More informationPhylogenetic Tree Shape
Phylogenetic Tree Shape Arne Mooers Simon Fraser University Vancouver, Canada Discussions with generous colleagues & students: Mark Pagel, Sean Nee (Tree Statistics) Mike Steel (Tree shapes) Steve Heard,
More informationImproved maximum parsimony models for phylogenetic networks
Improved maximum parsimony models for phylogenetic networks Leo van Iersel Mark Jones Celine Scornavacca December 20, 207 Abstract Phylogenetic networks are well suited to represent evolutionary histories
More informationshould be presented and explained in the combined species tree (Fitch, 1970; Goodman et al., 1979). The gene divergence can be the results of either s
On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies Louxin Zhang lxzhang@iss.nus.sg BioInformatics Center Institute of Systems Science Heng Mui Keng Terrace Singapore 119597 Abstract
More information