LOWER BOUNDS ON SEQUENCE LENGTHS REQUIRED TO RECOVER THE EVOLUTIONARY TREE. (extended abstract submitted to RECOMB '99)

Size: px
Start display at page:

Download "LOWER BOUNDS ON SEQUENCE LENGTHS REQUIRED TO RECOVER THE EVOLUTIONARY TREE. (extended abstract submitted to RECOMB '99)"

Transcription

1 LOWER BOUNDS ON SEQUENCE LENGTHS REQUIRED TO RECOVER THE EVOLUTIONARY TREE MIKL OS CS } UR OS AND MING-YANG KAO (extended abstract submitted to RECOMB '99) Abstract. In this paper we study the sequence length requirements of distance-based evolutionary tree building algorithms in the Jukes-Cantor model of evolution. By deriving lower bounds on sequence lengths required to recover the evolutionary tree topology correctly, we show that two algorithms, the Short Quartet Method and the Harmonic Greedy Triplets algorithms have optimal sequence length requirements. 1. Introduction. A main area of computational biology is the development and analysis of evolutionary tree building algorithms [15]. By using biomolecular sequences, these algorithms have not only enabled the exploration of evolutionary relationships among species but also led to the discovery of new proteins and to the inference of transmission chains of viral diseases such as AIDS [19]. The evolutionary tree in biology can be dened as an edge weighted binary tree, in which the nodes correspond to taxa, and edge weights correspond to time of divergence between them [23]. Evolution in this model can be viewed as the broadcasting of a character sequence (the root sequence) along the edges from the root towards the leaves. On each edge, the sequence undergoes a certain number of changes or mutations, and the mutated sequences are observed at the leaves. By dening a meaningful mutation model that relates changes in character sequences to time of divergence, one can attempt to reconstruct the weighted binary tree. The primary goal of such reconstruction is to correctly recover the tree topology, i.e., the tree without the edge weights. A secondary goal is to estimate the edge weights as correctly as possible. By dening a probabilistic model of mutations, which denes a probability distribution is dened over sets of n sequences of length `, where n is the number of leaves and ` is the length of an observed sequence. An obvious requirement for a tree building algorithm, which we call computational eciency, is that it runs in time polynomial in n and `. It is also important to realize that the sequences are rather short, as the length of currently available biomolecular sequences ranges from a few hundreds to a few thousands. Therefore, another principle to consider is statistical eciency, Department of Computer Science, Yale University, New Haven, CT 06520; contact csuros-miklos@cs.yale.edu. 1

2 which requires that the algorithm recovers the tree with high probability from sequences that have polynomial length in the number of taxa. Unfortunately, almost all the existing algorithms violate one or both of these principles. Among the known algorithms, the class of parsimony algorithms [14] are the most popular. They attempt to compute a tree that minimizes the number of mutations leading to the observed sequences. Unfortunately, the problem of optimizing parsimony is NP-hard [9]. In addition, it is not a consistent method [5, 13] in that increasing the length of sample sequences generated by an evolutionary tree ad innitum may still not result in the correct topology being inferred. Consequently, statistical eciency cannot be expected for all evolutionary trees, and computational eciency can be achieved only by using heuristics to derive approximations to optimal solutions. The class of distance-based algorithms are also widely used [23]. These algorithms rst calculate pairwise evolutionary distances between taxa from the observed sequences and then build a tree from the resulting distance matrix. Many of these algorithms strive to nd an evolutionary tree among all possible trees to t the observed distances the best according to some metric. Such an optimization task is provably NP-hard in known cases; see [8] for L 1 and L 2 metric norms, and see [1] for L 1. However, a number of distance-based algorithms are computationally ecient. Examples include Neighbor-Joining [21, 20], Short Quartet [10], Harmonic Greedy Triplets [7], the algorithm of Farach and Kannan [12], and the algorithm of Cryan, Goldberg and Goldberg [6] to mention a few. The statistical ef- ciency of these algorithms has been analyzed mostly in a simple model of evolution, the Jukes-Cantor model. The Short Quartet (SQM) and Harmonic Greedy Triplets (HGT) algorithms have been shown to be statistically ecient in this model, whereas the Neighbor Joining algorithm and the one of Farach and Kannan have exponential sample length requirements [2, 11]. In this paper we derive lower bounds on the sample length requirements for the eciency of any distance-based algorithm, which match the bounds of HGT and SQM tightly. 2. Model of sequence evolution The Jukes-Cantor model of sequence evolution. From a sample of aligned biological sequences (e.g., see Figure 2.1), one can build a tree that provides a stochastic model of the succession of pointwise mutations leading to the observed sequences. The elements of the sequences are taken from a nite alphabet such as that of amino acids, nucleic acids, or codons. The nodes of such a tree represent taxa, each corresponding to a 2

3 Woolly mammoth (MPR) African elephant (LAF) Asian elephant (EMA) Dugong (DDG) Manatee (TMA) CTAAATCATCACTGATCAAAGAGAGC CTAAATCATCACCGATCAAAGAGAGC CTAAATCATCGCTGATCAAAGAGAGC TTAAATCACTCCCGATCATAAAGGAGC TCAAATCATTACTGACCATAAAGGAGC character position MPR LAF EMA DDG TMA Fig Taken from [18], this example uses the sequences of 12S ribosomal RNA and cytochrome b in mitochondrial DNA to establish the evolutionary relationship among the woolly mammoth and its extant relatives. The sequences shown are for 12S rrna, base positions 81{110, omitting deletions that are common in the ve taxa above. sequence. The leaves represent terminal taxa, which correspond to observable sequences; the nonleaf nodes represent ancestors of the terminal taxa, which correspond to unobservable sequences. The mutations occur along the edges. To formalize this modeling, this paper employs the generalized Jukes- Cantor model [16, 17] of sequence evolution dened as follows. Let m 2 and n 3 be two integers. Let A = fa 1 ; : : :; a m g be a nite alphabet. An evolutionary tree T for A is a rooted binary tree of n leaves with an edge mutation probability p e for each tree edge e. The edge mutation probabilities are bounded away from 0 and 1? 1 m, i.e., there exist f and g such that for every edge e of T, 0 < f p e g < 1? 1 m : Given a sequence s 1 s` 2 A` associated with the root of T, a set of n mutated sequences in A` is generated by ` random labelings of the tree at the nodes. These ` labelings are mutually independent. The labelings at the j-th leaf give the j-th sequence s (j) 1 s(j) `, and the i-th labeling of the tree gives the i-th symbols s (1) i ; : : :; s (n) i. The i-th labeling is carried out from the root towards the leaves along the edges. The root is labeled by s i. On edge e, the child's label is the same as the parent's with probability 1?p e or is dierent with probability pe for each dierent symbol. Such mutations m?1 of symbols along the edges are mutually independent. The topology T (T ) of T is the unrooted tree obtained from T by omitting the edge mutation probabilities, and replacing the two edges between the root and its children with a single edge. We further require that the leaves of T (T ) are labeled with the same sequences as in T, but T (T ) need not be 3

4 labeled otherwise. Our task is to design a learning algorithm that takes ` mutated sequences as input and recovers T (T ) with high probability Evolutionary distance. Distance-based tree building methods rst calculate pairwise evolutionary distances between the terminal taxa, and subsequently build the tree based on these values. The function is a distance metric on T if and only if the following four conditions hold. 1. takes values on ordered pairs of nodes and for any two nodes X and Y in T, XY 2 [0; 1). 2. is symmetric, i.e., for any two nodes X and Y, XY = Y X. 3. is additive, i.e., for any three nodes X, Y, and X such that Y lies on the tree path between X and Z, XZ = XY + Y Z. 4. For two nodes X and Y such that Prf X = Y g = 1, XY = 0. In particular, XX = 0. For an edge e with endpoints X and Y, XY is referred to as the edge length or edge length of e, denoted by e. The function dened as XY = e? XY measures the similarity of the tree nodes. By the properties of the distance metric, 0 < XY < 1, XY = Y X, and XY is multiplicative along any tree path. For two nodes X and Y, let Also, for brevity, let p XY = Prf X 6= Y g : = m m? 1 : The following theorem is well-known in the literature, for a formal proof, see, for example [7]. Theorem 2.1. Dene the function on every node pair X; Y as XY =? ln Prf X = Y g? 1 m? 1 Prf X 6= Y g =? ln(1? p XY ): Then is a distance metric in the generalized Jukes-Cantor model of evolution. One might wonder if there are other possible distance metrics in this model. It is evident that if is a distance metric, then c is a distance metric, as well, for any c > 0. The following lemma shows that if is a function of p XY, then that function is uniquely determined up to a scaling factor. 4

5 Lemma 2.2. Let be an additive distance along any tree path, dened by XY = '(Prf X 6= Y g), where ' : [0; 1? 1=m) 7! [0; +1) is a function with lim x!+0 = '(0). Then there exists a real number c such that '(p) =?c ln(1? p) for all p. Proof. Assume that X, Y, and Z are three consecutive nodes on a tree path, with Y being a child of X and Z a child of Y. Let p XY = p Y Z = p. Then p XZ = 2p? p 2 because is multiplicative. Since is additive,2'(p) = '(2p? p 2 ) for every p. Dene 1 (x) = (1? e?x )= and 2 (x) = '( 1 (x)), for x 2 [0; 1). Then 2 2 (x) = 2 (2x). We prove that there exists a real number c such that 2 (x) = cx for all x by contradiction. Assume that there is no such constant and thus there exists x; v > 0 and u 6= 0 such that 2(x + v) x + v = 2 (x) x + u: Dene the series a k = (x + v)=2 k and b k = x=2 k for k 0. Since 2 (2x) = 2 2 (x), 2 (a k )=a k = 2 (x+v)=(x+v) and 2 (b k )=b k = 2 (x)=x. Therefore, 2(a k ) = 2 (b k ) + u(1 + v=x) and thus lim k!1 2 (a k ) 6= lim k!1 2 (b k ). Since lim k!1 1 (a k ) = lim k!1 1 (b k ) = 0, ' cannot be continuous at 0 by virtue of the fact that lim k!1 '( 1 (a k )) 6= lim k!1 '( 2 (b k )). 3. Evolutionary tree building algorithms Estimation of evolutionary distances. Distance-based algorithms start by estimating evolutionary distances between terminal taxa. If X and Y are leaves, their similarity can be estimated using sample sequences as (3.1) ^ XY = 1` `X i=1 I Xi Y i ; where X 1 ; : : :; X` and Y 1 ; : : :; Y` are the symbols at positions 1; : : :; ` of the observed sample sequences for the two leaves, and (?1 if x 6= y; I xy = m?1 1 if x = y: The distance between X and Y is estimated as (3.2) ^ XY = (? ln ^XY if ^ XY > 0; 1 otherwise. 5

6 Distance-based algorithms build the tree by using the distance estimates ^ between leaves. In order to recover the topology successfully, these estimates have to be close to the true distances. The next lemma gives a lower bound on the sequence length required for an accurate estimation. Lemma 3.1. Let 0 < < 1, and 0 < < 1. For any leaf pairs X and Y, if then ` = 1 2 f 2 2 XY Pr n ^ XY? XY? ln(1? f) o ;., Proof. (Sketch.) First observe, that ^ XY can be viewed as a linear transformation of a binomially distributed random variable with parameters ` and (1? XY )=. The proof bounds the rate of convergence of that binomial random variable by the tail of the standard normal distribution using the Berry-Esseen Theorem [3] about the convergence rate in the Central Limit Theorem. The resulting integral is bounded by using a Taylor series approximation Distance matrices. A distance matrix is a symmetric n n matrix, in which diagonal entries are zero and non-diagonal entries are positive. An additive distance matrix is a distance matrix D for which there exists an evolutionary tree with leaves 1; : : :; n such that D[i; j] = ij. A distancebased algorithm is dened as a partial function F on the set of distance matrices such that for any D, either F(D) = fail or F(D) is a topology. Each T denes a distance matrix D T by the distances between pairs of nodes XY, so that T (T ) is uniquely determined by D T [4]. We assume that F(D T ) = T (T ). Based on ideas of Atteson [2] and Erd}os et al. [11], we dene the following method to construct trees that dene distance matrices close to the one dened by T. Let e be an edge of T. By contracting the edge e and preserving the edge lengths of every other edge, one obtains the non-binary edge weighted tree T 0. T 0 has exactly one vertex adjacent to four edges. Subsequently, this vertex can be replaced by an edge e 0 with a positive edge weight e 0? ln(1? f). In this way, one can obtain three trees with dierent topologies, one of which has the same topology as T. If an evolutionary tree T 00 can be obtained from T with e 0 = x in this manner, and T (T 00 ) 6= T (T ), then T 00 and T have a similar topology e;x denoted by T ` T 00. Let T 00 dene the distance metric 00. Dene C e as the set of leaf pairs XY for which XY 6= 00 XY. If x = e, then the matrices D T and D T 00 dier only at the entries corresponding to C e, by e. 6

7 Theorem 3.2. Let T and T 0 be two trees such that T ` T 0. Let ^D be a distance matrix corresponding to ^ on leaf pairs, calculated from a sample of length ` that is generated by either T or T 0. Suppose that F has a failure probability less than on sequences of length `. In other words, with probability at least 1?, F( ^D) = T (T ) when ^D is generated by T and F( ^D) = T (T 0 ) when ^D is generated by T 0. Then ` = 1 p 2 e max XY 2Ce 2 XY Proof. (Sketch.) The proof uses Lemma 3.1 to prove a lower bound on ` by showing that when ^D comes from a shorter sample, then F cannot recognize both topologies with high probability. 4. Optimal algorithms. Similarly to [22, 11, 10], we dene the notion of depth as follows. The g-depth of a node in a rooted tree is the smallest number of edges in a path from the node to a leaf. Let e be an edge between nodes u 1 and u 2 in a rooted tree T 0. Let T 0 and 1 T 0 2 be the subtrees of T 0 obtained by cutting e which contain u 1 and u 2, respectively. The g-depth of e in T 0 is the larger of the g-depth of u 1 in T 0 and that of 1 u 2 in T 0. 2 The g-depth of a rooted tree is the largest possible g-depth of an edge in the tree. (We add the prex g to the term depth because this usage of depth is nonstandard in graph theory.) Dene d as the g-depth of T. Then d 1 + blog 2 (n? 1)c. As a corollary of 3.2, we obtain the following result Corollary 4.1. For every 2 < d 1 + blog 2 (n? 1)c, there is a tree T with depth d such that any algorithm F needs sample sequences of length 1 ` = f 2 (1? g) 4d to recover T (T ) with probability 1? o (1). Proof. (Sketch.) the proof consists of constructing a tree T such that it has an internal edge e with p e = f, every edge e 0 has p e 0 = g, and d equals the g-depth of the endpoints of e in the subtrees obtained when T is cut at e. The diameter of T is dened as the maximum number of edges in a path between leaves of T. The diameter is always at least as large as 2d and can be even (n). Many distance-based algorithms have sequence length bounds that are exponential in the diameter and therefore in n. Atteson [2] established an O bound on sample length require- log n f 2 (1?g) diam ments of Neighbor-Joining [20] and related algorithms. The algorithms! : e;e 7

8 of Agarwala et al. [1], and Farach and Kannan [12] try to nd a tree T such that the distance matrix D T is close to ^D in the L 1 metric, i.e., max i;j j ^D[i; j]? DT [i; j]j is small. This approach results also in exponential sequence length requirements. In particular, a ` = 1 f 2 (1?g) diam lower bound can be derived [11]. Cryan, Goldberg and Goldberg [6] recently developed an algorithm that outputs a T such that max i;j jd T [i; j]?d T [i; j]j is small. Our lower bound results apply here, as well, when this algorithm is used for topology estimation. However, if instead of recovering the topology, the minimization of max i;j jd T [i; j]? D T [i; j]j is the goal, then the sample length bounds do not depend on f and g. The Short Quartet Method (SQM) [10] is an algorithm based on a greedy selection of quartets of leaves that recovers the correct tree topology with high probability from short sample sequences. In particular, for every T with depth d, there exists an ` = O log n f 2 (1? g) 4d+6 such that the topology is recovered correctly with probability 1? o (1). The Harmonic Greedy Triplets (HGT) [7] algorithm is based on a greedy selection of triplets and successfully recovers T (T ) from sequences of length log n ` = O f 2 (1? g) 4d+8 with probability 1? o (1). In addition, HGT recovers the edge weights with high accuracy, whereas SQM returns only the topology. Both algorithms provide a sample length bound that is close to our lower bound derived for any distance-based algorithm. 5. Summary. We derived lower bounds on sample length requirements of any distance-based algorithm in the Jukes-Cantor model of sequence evolution. We also showed that the usual eviolutionary distance denition is unique in this model and thus no distance-based algorithm can achieve a better performance by using a dierent distance denition. Finally, we showed the two distance-based algorithms, the Short Quartet and the Harmonic Greedy Triplets algorithms match these lower bounds closely and therefore these algorithms oer optimal performance. 8

9 REFERENCES [1] R. Agarwala, V. Bafna, M. Farach, B. Narayanan, M. Paterson, and M. Thorup, On the approximability of numerical taxonomy (tting distances by tree metrics), in Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, Atlanta, Georgia, 28{30 Jan. 1996, pp. 365{372. [2] K. Atteson, The performance of neighbor-joining algorithms of phylogeny reconstruction, in Computing and Combinatorics, Third Annual International Conference, Shanghai, China, T. Jiang and D. T. Lee, eds., vol of Lecture Notes in Computer Science, Berlin, 1997, Springer-Verlag, pp. 101{110. [3] R. N. Bhattachraya and R. Ranga Rao, Normal approximation and asymptotic expansions, John Wiley & Sons, New York, [4] P. Buneman, The recovery of trees from dissimilarity matrices, in Mathematics in the Archaelogical and Historical Sciences, F. R. Hodson, D. G. Kendall, and P. Tautu, eds., Edinburgh University Press, Edinburgh, 1971, pp. 387{395. [5] J. Cavender, Taxonomy with condence, Mathematical Biosciences, 40 (1978), pp. 271{280. [6] M. Cryan, L. A. Goldberg, and P. W. Goldberg, Evolutionary trees can be learned in polynomial time in the two-state general Markov-model, Tech. Report RR347, Department of Computer Science, University of Warwick, UK, preliminary version at FOCS '98. [7] M. Cs}uros and M.-Y. Kao, Recovering evolutionary trees through Harmonic Greedy Triplets, in SODA '99, [8] W. H. E. Day, Computational complexity of inferring phylogenies from dissimilarity matrices, Bulletin of Mathematical Biology, 49 (1987), pp. 461{467. [9] W. H. E. Day, D. S. Johnson, and D. Sankoff, The computational complexity of inferring rooted phylogenies by parsimony, Mathematical Biosciences, 81 (1986), pp. 33{42. [10] P. Erd}os, K. Rice, M. A. Steel, L. A. Szekely, and T. Warnow, The Short Quartet Method, Mathematical Modeling and Scientic Computing, (1998). to appear. [11] P. Erd}os, M. A. Steel, L. A. Szekely, and T. Warnow, A few logs suce to build (almost) all trees (ii), Tech. Report 97-72, DIMACS, [12] M. Farach and S. Kannan, Ecient algorithms for inverting evolution, in Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, 22{24 May 1996, pp. 230{236. [13] J. Felsenstein, Cases in which parsimony or compatibility methods will be positively misleading, Systematic Zoology, 22 (1978), pp. 240{249. [14], Numerical methods for inferring evolutionary trees, The Quarterly Review of Biology, 57 (1982), pp. 379{404. [15] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, Cambridge, UK, [16] T. H. Jukes and C. R. Cantor, Evolution of protein molecules, in Mammalian Protein Metabolism, H. N. Munro, ed., vol. III, Academic Press, New York, 1969, ch. 24, pp. 21{132. [17] J. Neyman, Molecular studies of evolution: a source of novel statistical problems, in Statistical Decision Theory and Related Topics, S. S. Gupta and J. Yackel, eds., Academic Press, New York, 1971, pp. 1{27. [18] M. Noro, R. Masuda, I. A. Dubrovo, M. C. Yoshida, and M. Kato, Molecular 9

10 phylogenetic inference of the Woolly Mammoth mammuthus primigenius, based on complete sequences of mitochondrial cytochrome b and 12S ribosomal RNA genes, Journal of Molecular Evolution, 46 (1998), pp. 314{326. [19] C.-Y. Ou, C. A. Cieselski, G. Myers, C. I. Bandea, C.-C. Luo, B. T. M. Korber, J. I. Mullins, G. Schochetman, R. L. Berkelman, A. N. Economou, J. J. Witte, L. J. Furman, G. A. Satten, K. A. MacInnes, J. W. Curran, and H. W. Jaffe, Molecular epidemiology of HIV transmission in a dental practice, Science, 256 (1992), pp. 1165{1171. [20] N. Saitou and M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Molecular Biology and Evolution, 4 (1987), pp. 406{425. [21] S. Sattath and A. Tversky, Additive similarity trees, Psychometrika, 42 (1977), pp. 319{345. [22] D. D. Sleator and R. E. Tarjan, A data structure for dynamic trees, Journal of Computer and System Sciences, 26 (1983), pp. 362{391. [23] D. L. Swofford, G. J. Olsen, P. J. Waddell, and D. M. Hillis, Phylogenetic inference, in Molecular Systematics, D. M. Hillis, C. Moritz, and B. K. Mable, eds., Sinauer Associates, Inc., Sunderland, Ma, 2nd ed., 1996, ch. 11, pp. 407{

TheDisk-Covering MethodforTree Reconstruction

TheDisk-Covering MethodforTree Reconstruction TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

A few logs suce to build (almost) all trees: Part II

A few logs suce to build (almost) all trees: Part II Theoretical Computer Science 221 (1999) 77 118 www.elsevier.com/locate/tcs A few logs suce to build (almost) all trees: Part II Peter L. Erdős a;, Michael A. Steel b,laszlo A.Szekely c, Tandy J. Warnow

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 6, Numbers 3/4, 1999 Mary Ann Liebert, Inc. Pp. 369 386 Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction DANIEL H. HUSON, 1 SCOTT M.

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

1 Introduction The j-state General Markov Model of Evolution was proposed by Steel in 1994 [14]. The model is concerned with the evolution of strings

1 Introduction The j-state General Markov Model of Evolution was proposed by Steel in 1994 [14]. The model is concerned with the evolution of strings Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model Mary Cryan Leslie Ann Goldberg Paul W. Goldberg. July 20, 1998 Abstract The j-state General Markov Model of evolution

More information

Reconstructing Trees from Subtree Weights

Reconstructing Trees from Subtree Weights Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree

More information

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Output: A tree metric T which spans S and ts D. This denition leaves two points unanswered: rst, what kind of tree metric, and second, what does it me

Output: A tree metric T which spans S and ts D. This denition leaves two points unanswered: rst, what kind of tree metric, and second, what does it me ON THE APPROXIMABILITY OF NUMERICAL TAXONOMY (FITTING DISTANCES BY TREE METRICS) RICHA AGARWALA, VINEET BAFNA y, MARTIN FARACH z, MIKE PATERSON x, AND MIKKEL THORUP { Abstract. We consider the problem

More information

Recent Advances in Phylogeny Reconstruction

Recent Advances in Phylogeny Reconstruction Recent Advances in Phylogeny Reconstruction from Gene-Order Data Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131 Department Colloqium p.1/41 Collaborators

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Minimum evolution using ordinary least-squares is less robust than neighbor-joining Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 5 CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC

More information

Key words. computational learning theory, evolutionary trees, PAC-learning, learning of distributions,

Key words. computational learning theory, evolutionary trees, PAC-learning, learning of distributions, EVOLUTIONARY TREES CAN BE LEARNED IN POLYNOMIAL TIME IN THE TWO-STATE GENERAL MARKOV MODEL MARY CRYAN, LESLIE ANN GOLDBERG, AND PAUL W. GOLDBERG. Abstract. The j-state General Markov Model of evolution

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed

A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed Computer Science Technical Reports Computer Science 3-17-1994 A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed Richa Agarwala Iowa State University David Fernández-Baca

More information

Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances

Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances Ilan Gronau Shlomo Moran September 6, 2006 Abstract Reconstructing phylogenetic trees efficiently and accurately from distance estimates

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

arxiv: v5 [q-bio.pe] 24 Oct 2016

arxiv: v5 [q-bio.pe] 24 Oct 2016 On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks Christopher Bryant a, Mareike Fischer b, Simone Linz c, Charles Semple d arxiv:1505.06898v5 [q-bio.pe] 24 Oct 2016 a Statistics

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

c 2001 Society for Industrial and Applied Mathematics

c 2001 Society for Industrial and Applied Mathematics SIAM J. COMPUT. Vol. 31, No. 2, pp. 375 397 c 2001 Society for Industrial and Applied Mathematics EVOLUTIONARY TREES CAN BE LEARNED IN POLYNOMIAL TIME IN THE TWO-STATE GENERAL MARKOV MODEL MARY CRYAN,

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

arxiv: v1 [q-bio.pe] 1 Jun 2014

arxiv: v1 [q-bio.pe] 1 Jun 2014 THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from

More information

Reconstruction of certain phylogenetic networks from their tree-average distances

Reconstruction of certain phylogenetic networks from their tree-average distances Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

On the Uniqueness of the Selection Criterion in Neighbor-Joining

On the Uniqueness of the Selection Criterion in Neighbor-Joining Journal of Classification 22:3-15 (2005) DOI: 10.1007/s00357-005-0003-x On the Uniqueness of the Selection Criterion in Neighbor-Joining David Bryant McGill University, Montreal Abstract: The Neighbor-Joining

More information

The Complexity of Constructing Evolutionary Trees Using Experiments

The Complexity of Constructing Evolutionary Trees Using Experiments The Complexity of Constructing Evolutionary Trees Using Experiments Gerth Stlting Brodal 1,, Rolf Fagerberg 1,, Christian N. S. Pedersen 1,, and Anna Östlin2, 1 BRICS, Department of Computer Science, University

More information

Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites

Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites Benny Chor Michael Hendy David Penny Abstract We consider the problem of finding the maximum likelihood rooted tree under

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

More information

INVARIANTS STEVEN N. EVANS AND XIAOWEN ZHOU. Abstract. The method of invariants is an approach to the problem of reconstructing

INVARIANTS STEVEN N. EVANS AND XIAOWEN ZHOU. Abstract. The method of invariants is an approach to the problem of reconstructing DIFFERENT TREES HAVE DISTINCT PHLOGENETIC INVARIANTS STEVEN N. EVANS AND XIAOWEN ZHOU Abstract. The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

The Generalized Neighbor Joining method

The Generalized Neighbor Joining method The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

arxiv: v1 [cs.cc] 9 Oct 2014

arxiv: v1 [cs.cc] 9 Oct 2014 Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz May 7, 08 arxiv:40.7v [cs.cc] 9 Oct 04 Abstract A ternary

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Tree-average distances on certain phylogenetic networks have their weights uniquely determined

Tree-average distances on certain phylogenetic networks have their weights uniquely determined Tree-average distances on certain phylogenetic networks have their weights uniquely determined Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu

More information

Realization Plans for Extensive Form Games without Perfect Recall

Realization Plans for Extensive Form Games without Perfect Recall Realization Plans for Extensive Form Games without Perfect Recall Richard E. Stearns Department of Computer Science University at Albany - SUNY Albany, NY 12222 April 13, 2015 Abstract Given a game in

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Phylogeny of Mixture Models

Phylogeny of Mixture Models Phylogeny of Mixture Models Daniel Štefankovič Department of Computer Science University of Rochester joint work with Eric Vigoda College of Computing Georgia Institute of Technology Outline Introduction

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Exact Algorithms and Experiments for Hierarchical Tree Clustering

Exact Algorithms and Experiments for Hierarchical Tree Clustering Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Exact Algorithms and Experiments for Hierarchical Tree Clustering Jiong Guo Universität des Saarlandes jguo@mmci.uni-saarland.de

More information

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

More information

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Li-San Wang Robert K. Jansen Dept. of Computer Sciences Section of Integrative Biology University of Texas, Austin,

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Structure-Based Comparison of Biomolecules

Structure-Based Comparison of Biomolecules Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein

More information

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas. Solving the Maximum Agreement Subtree and Maximum Compatible Tree problems on bounded degree trees LIRMM, Montpellier France 4th July 2006 Introduction The Mast and Mct problems: given a set of evolutionary

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Is the equal branch length model a parsimony model?

Is the equal branch length model a parsimony model? Table 1: n approximation of the probability of data patterns on the tree shown in figure?? made by dropping terms that do not have the minimal exponent for p. Terms that were dropped are shown in red;

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

c 1999 Society for Industrial and Applied Mathematics

c 1999 Society for Industrial and Applied Mathematics SIAM J. COMPUT. Vol. 28, No. 3, pp. 1073 1085 c 1999 Society for Industrial and Applied Mathematics ON THE APPROXIMABILITY OF NUMERICAL TAXONOMY (FITTING DISTANCES BY TREE METRICS) RICHA AGARWALA, VINEET

More information

Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

More information

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa

More information

The Power of Amnesia: Learning Probabilistic. Automata with Variable Memory Length

The Power of Amnesia: Learning Probabilistic. Automata with Variable Memory Length The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length DANA RON YORAM SINGER NAFTALI TISHBY Institute of Computer Science, Hebrew University, Jerusalem 9904, Israel danar@cs.huji.ac.il

More information

Graphs, permutations and sets in genome rearrangement

Graphs, permutations and sets in genome rearrangement ntroduction Graphs, permutations and sets in genome rearrangement 1 alabarre@ulb.ac.be Universite Libre de Bruxelles February 6, 2006 Computers in Scientic Discovery 1 Funded by the \Fonds pour la Formation

More information

Distance Corrections on Recombinant Sequences

Distance Corrections on Recombinant Sequences Distance Corrections on Recombinant Sequences David Bryant 1, Daniel Huson 2, Tobias Kloepper 2, and Kay Nieselt-Struwe 2 1 McGill Centre for Bioinformatics 3775 University Montréal, Québec, H3A 2B4 Canada

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

Extracted from a working draft of Goldreich s FOUNDATIONS OF CRYPTOGRAPHY. See copyright notice.

Extracted from a working draft of Goldreich s FOUNDATIONS OF CRYPTOGRAPHY. See copyright notice. 106 CHAPTER 3. PSEUDORANDOM GENERATORS Using the ideas presented in the proofs of Propositions 3.5.3 and 3.5.9, one can show that if the n 3 -bit to l(n 3 ) + 1-bit function used in Construction 3.5.2

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

The least-squares approach to phylogenetics was first suggested

The least-squares approach to phylogenetics was first suggested Combinatorics of least-squares trees Radu Mihaescu and Lior Pachter Departments of Mathematics and Computer Science, University of California, Berkeley, CA 94704; Edited by Peter J. Bickel, University

More information

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS KT Huber, V Moulton, C Semple, and M Steel Department of Mathematics and Statistics University of Canterbury Private Bag 4800 Christchurch,

More information

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS PETER J. HUMPHRIES AND CHARLES SEMPLE Abstract. For two rooted phylogenetic trees T and T, the rooted subtree prune and regraft distance

More information

Lecture 11 : Asymptotic Sample Complexity

Lecture 11 : Asymptotic Sample Complexity Lecture 11 : Asymptotic Sample Complexity MATH285K - Spring 2010 Lecturer: Sebastien Roch References: [DMR09]. Previous class THM 11.1 (Strong Quartet Evidence) Let Q be a collection of quartet trees on

More information

Algebraic Statistics Tutorial I

Algebraic Statistics Tutorial I Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E. Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 14 March 11 Lecturer: Lior Pachter Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar 14.1 Introduction The goal of today s lecture

More information

Organisatorische Details

Organisatorische Details Organisatorische Details Vorlesung: Di 13-14, Do 10-12 in DI 205 Übungen: Do 16:15-18:00 Laborraum Schanzenstrasse Vorwiegend Programmieren in Matlab/Octave Teilnahme freiwillig. Übungsblätter jeweils

More information

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction

Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Daniel Štefankovič Eric Vigoda June 30, 2006 Department of Computer Science, University of Rochester, Rochester, NY 14627, and Comenius

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information