LOWER BOUNDS ON SEQUENCE LENGTHS REQUIRED TO RECOVER THE EVOLUTIONARY TREE. (extended abstract submitted to RECOMB '99)
|
|
- Malcolm Hawkins
- 6 years ago
- Views:
Transcription
1 LOWER BOUNDS ON SEQUENCE LENGTHS REQUIRED TO RECOVER THE EVOLUTIONARY TREE MIKL OS CS } UR OS AND MING-YANG KAO (extended abstract submitted to RECOMB '99) Abstract. In this paper we study the sequence length requirements of distance-based evolutionary tree building algorithms in the Jukes-Cantor model of evolution. By deriving lower bounds on sequence lengths required to recover the evolutionary tree topology correctly, we show that two algorithms, the Short Quartet Method and the Harmonic Greedy Triplets algorithms have optimal sequence length requirements. 1. Introduction. A main area of computational biology is the development and analysis of evolutionary tree building algorithms [15]. By using biomolecular sequences, these algorithms have not only enabled the exploration of evolutionary relationships among species but also led to the discovery of new proteins and to the inference of transmission chains of viral diseases such as AIDS [19]. The evolutionary tree in biology can be dened as an edge weighted binary tree, in which the nodes correspond to taxa, and edge weights correspond to time of divergence between them [23]. Evolution in this model can be viewed as the broadcasting of a character sequence (the root sequence) along the edges from the root towards the leaves. On each edge, the sequence undergoes a certain number of changes or mutations, and the mutated sequences are observed at the leaves. By dening a meaningful mutation model that relates changes in character sequences to time of divergence, one can attempt to reconstruct the weighted binary tree. The primary goal of such reconstruction is to correctly recover the tree topology, i.e., the tree without the edge weights. A secondary goal is to estimate the edge weights as correctly as possible. By dening a probabilistic model of mutations, which denes a probability distribution is dened over sets of n sequences of length `, where n is the number of leaves and ` is the length of an observed sequence. An obvious requirement for a tree building algorithm, which we call computational eciency, is that it runs in time polynomial in n and `. It is also important to realize that the sequences are rather short, as the length of currently available biomolecular sequences ranges from a few hundreds to a few thousands. Therefore, another principle to consider is statistical eciency, Department of Computer Science, Yale University, New Haven, CT 06520; contact csuros-miklos@cs.yale.edu. 1
2 which requires that the algorithm recovers the tree with high probability from sequences that have polynomial length in the number of taxa. Unfortunately, almost all the existing algorithms violate one or both of these principles. Among the known algorithms, the class of parsimony algorithms [14] are the most popular. They attempt to compute a tree that minimizes the number of mutations leading to the observed sequences. Unfortunately, the problem of optimizing parsimony is NP-hard [9]. In addition, it is not a consistent method [5, 13] in that increasing the length of sample sequences generated by an evolutionary tree ad innitum may still not result in the correct topology being inferred. Consequently, statistical eciency cannot be expected for all evolutionary trees, and computational eciency can be achieved only by using heuristics to derive approximations to optimal solutions. The class of distance-based algorithms are also widely used [23]. These algorithms rst calculate pairwise evolutionary distances between taxa from the observed sequences and then build a tree from the resulting distance matrix. Many of these algorithms strive to nd an evolutionary tree among all possible trees to t the observed distances the best according to some metric. Such an optimization task is provably NP-hard in known cases; see [8] for L 1 and L 2 metric norms, and see [1] for L 1. However, a number of distance-based algorithms are computationally ecient. Examples include Neighbor-Joining [21, 20], Short Quartet [10], Harmonic Greedy Triplets [7], the algorithm of Farach and Kannan [12], and the algorithm of Cryan, Goldberg and Goldberg [6] to mention a few. The statistical ef- ciency of these algorithms has been analyzed mostly in a simple model of evolution, the Jukes-Cantor model. The Short Quartet (SQM) and Harmonic Greedy Triplets (HGT) algorithms have been shown to be statistically ecient in this model, whereas the Neighbor Joining algorithm and the one of Farach and Kannan have exponential sample length requirements [2, 11]. In this paper we derive lower bounds on the sample length requirements for the eciency of any distance-based algorithm, which match the bounds of HGT and SQM tightly. 2. Model of sequence evolution The Jukes-Cantor model of sequence evolution. From a sample of aligned biological sequences (e.g., see Figure 2.1), one can build a tree that provides a stochastic model of the succession of pointwise mutations leading to the observed sequences. The elements of the sequences are taken from a nite alphabet such as that of amino acids, nucleic acids, or codons. The nodes of such a tree represent taxa, each corresponding to a 2
3 Woolly mammoth (MPR) African elephant (LAF) Asian elephant (EMA) Dugong (DDG) Manatee (TMA) CTAAATCATCACTGATCAAAGAGAGC CTAAATCATCACCGATCAAAGAGAGC CTAAATCATCGCTGATCAAAGAGAGC TTAAATCACTCCCGATCATAAAGGAGC TCAAATCATTACTGACCATAAAGGAGC character position MPR LAF EMA DDG TMA Fig Taken from [18], this example uses the sequences of 12S ribosomal RNA and cytochrome b in mitochondrial DNA to establish the evolutionary relationship among the woolly mammoth and its extant relatives. The sequences shown are for 12S rrna, base positions 81{110, omitting deletions that are common in the ve taxa above. sequence. The leaves represent terminal taxa, which correspond to observable sequences; the nonleaf nodes represent ancestors of the terminal taxa, which correspond to unobservable sequences. The mutations occur along the edges. To formalize this modeling, this paper employs the generalized Jukes- Cantor model [16, 17] of sequence evolution dened as follows. Let m 2 and n 3 be two integers. Let A = fa 1 ; : : :; a m g be a nite alphabet. An evolutionary tree T for A is a rooted binary tree of n leaves with an edge mutation probability p e for each tree edge e. The edge mutation probabilities are bounded away from 0 and 1? 1 m, i.e., there exist f and g such that for every edge e of T, 0 < f p e g < 1? 1 m : Given a sequence s 1 s` 2 A` associated with the root of T, a set of n mutated sequences in A` is generated by ` random labelings of the tree at the nodes. These ` labelings are mutually independent. The labelings at the j-th leaf give the j-th sequence s (j) 1 s(j) `, and the i-th labeling of the tree gives the i-th symbols s (1) i ; : : :; s (n) i. The i-th labeling is carried out from the root towards the leaves along the edges. The root is labeled by s i. On edge e, the child's label is the same as the parent's with probability 1?p e or is dierent with probability pe for each dierent symbol. Such mutations m?1 of symbols along the edges are mutually independent. The topology T (T ) of T is the unrooted tree obtained from T by omitting the edge mutation probabilities, and replacing the two edges between the root and its children with a single edge. We further require that the leaves of T (T ) are labeled with the same sequences as in T, but T (T ) need not be 3
4 labeled otherwise. Our task is to design a learning algorithm that takes ` mutated sequences as input and recovers T (T ) with high probability Evolutionary distance. Distance-based tree building methods rst calculate pairwise evolutionary distances between the terminal taxa, and subsequently build the tree based on these values. The function is a distance metric on T if and only if the following four conditions hold. 1. takes values on ordered pairs of nodes and for any two nodes X and Y in T, XY 2 [0; 1). 2. is symmetric, i.e., for any two nodes X and Y, XY = Y X. 3. is additive, i.e., for any three nodes X, Y, and X such that Y lies on the tree path between X and Z, XZ = XY + Y Z. 4. For two nodes X and Y such that Prf X = Y g = 1, XY = 0. In particular, XX = 0. For an edge e with endpoints X and Y, XY is referred to as the edge length or edge length of e, denoted by e. The function dened as XY = e? XY measures the similarity of the tree nodes. By the properties of the distance metric, 0 < XY < 1, XY = Y X, and XY is multiplicative along any tree path. For two nodes X and Y, let Also, for brevity, let p XY = Prf X 6= Y g : = m m? 1 : The following theorem is well-known in the literature, for a formal proof, see, for example [7]. Theorem 2.1. Dene the function on every node pair X; Y as XY =? ln Prf X = Y g? 1 m? 1 Prf X 6= Y g =? ln(1? p XY ): Then is a distance metric in the generalized Jukes-Cantor model of evolution. One might wonder if there are other possible distance metrics in this model. It is evident that if is a distance metric, then c is a distance metric, as well, for any c > 0. The following lemma shows that if is a function of p XY, then that function is uniquely determined up to a scaling factor. 4
5 Lemma 2.2. Let be an additive distance along any tree path, dened by XY = '(Prf X 6= Y g), where ' : [0; 1? 1=m) 7! [0; +1) is a function with lim x!+0 = '(0). Then there exists a real number c such that '(p) =?c ln(1? p) for all p. Proof. Assume that X, Y, and Z are three consecutive nodes on a tree path, with Y being a child of X and Z a child of Y. Let p XY = p Y Z = p. Then p XZ = 2p? p 2 because is multiplicative. Since is additive,2'(p) = '(2p? p 2 ) for every p. Dene 1 (x) = (1? e?x )= and 2 (x) = '( 1 (x)), for x 2 [0; 1). Then 2 2 (x) = 2 (2x). We prove that there exists a real number c such that 2 (x) = cx for all x by contradiction. Assume that there is no such constant and thus there exists x; v > 0 and u 6= 0 such that 2(x + v) x + v = 2 (x) x + u: Dene the series a k = (x + v)=2 k and b k = x=2 k for k 0. Since 2 (2x) = 2 2 (x), 2 (a k )=a k = 2 (x+v)=(x+v) and 2 (b k )=b k = 2 (x)=x. Therefore, 2(a k ) = 2 (b k ) + u(1 + v=x) and thus lim k!1 2 (a k ) 6= lim k!1 2 (b k ). Since lim k!1 1 (a k ) = lim k!1 1 (b k ) = 0, ' cannot be continuous at 0 by virtue of the fact that lim k!1 '( 1 (a k )) 6= lim k!1 '( 2 (b k )). 3. Evolutionary tree building algorithms Estimation of evolutionary distances. Distance-based algorithms start by estimating evolutionary distances between terminal taxa. If X and Y are leaves, their similarity can be estimated using sample sequences as (3.1) ^ XY = 1` `X i=1 I Xi Y i ; where X 1 ; : : :; X` and Y 1 ; : : :; Y` are the symbols at positions 1; : : :; ` of the observed sample sequences for the two leaves, and (?1 if x 6= y; I xy = m?1 1 if x = y: The distance between X and Y is estimated as (3.2) ^ XY = (? ln ^XY if ^ XY > 0; 1 otherwise. 5
6 Distance-based algorithms build the tree by using the distance estimates ^ between leaves. In order to recover the topology successfully, these estimates have to be close to the true distances. The next lemma gives a lower bound on the sequence length required for an accurate estimation. Lemma 3.1. Let 0 < < 1, and 0 < < 1. For any leaf pairs X and Y, if then ` = 1 2 f 2 2 XY Pr n ^ XY? XY? ln(1? f) o ;., Proof. (Sketch.) First observe, that ^ XY can be viewed as a linear transformation of a binomially distributed random variable with parameters ` and (1? XY )=. The proof bounds the rate of convergence of that binomial random variable by the tail of the standard normal distribution using the Berry-Esseen Theorem [3] about the convergence rate in the Central Limit Theorem. The resulting integral is bounded by using a Taylor series approximation Distance matrices. A distance matrix is a symmetric n n matrix, in which diagonal entries are zero and non-diagonal entries are positive. An additive distance matrix is a distance matrix D for which there exists an evolutionary tree with leaves 1; : : :; n such that D[i; j] = ij. A distancebased algorithm is dened as a partial function F on the set of distance matrices such that for any D, either F(D) = fail or F(D) is a topology. Each T denes a distance matrix D T by the distances between pairs of nodes XY, so that T (T ) is uniquely determined by D T [4]. We assume that F(D T ) = T (T ). Based on ideas of Atteson [2] and Erd}os et al. [11], we dene the following method to construct trees that dene distance matrices close to the one dened by T. Let e be an edge of T. By contracting the edge e and preserving the edge lengths of every other edge, one obtains the non-binary edge weighted tree T 0. T 0 has exactly one vertex adjacent to four edges. Subsequently, this vertex can be replaced by an edge e 0 with a positive edge weight e 0? ln(1? f). In this way, one can obtain three trees with dierent topologies, one of which has the same topology as T. If an evolutionary tree T 00 can be obtained from T with e 0 = x in this manner, and T (T 00 ) 6= T (T ), then T 00 and T have a similar topology e;x denoted by T ` T 00. Let T 00 dene the distance metric 00. Dene C e as the set of leaf pairs XY for which XY 6= 00 XY. If x = e, then the matrices D T and D T 00 dier only at the entries corresponding to C e, by e. 6
7 Theorem 3.2. Let T and T 0 be two trees such that T ` T 0. Let ^D be a distance matrix corresponding to ^ on leaf pairs, calculated from a sample of length ` that is generated by either T or T 0. Suppose that F has a failure probability less than on sequences of length `. In other words, with probability at least 1?, F( ^D) = T (T ) when ^D is generated by T and F( ^D) = T (T 0 ) when ^D is generated by T 0. Then ` = 1 p 2 e max XY 2Ce 2 XY Proof. (Sketch.) The proof uses Lemma 3.1 to prove a lower bound on ` by showing that when ^D comes from a shorter sample, then F cannot recognize both topologies with high probability. 4. Optimal algorithms. Similarly to [22, 11, 10], we dene the notion of depth as follows. The g-depth of a node in a rooted tree is the smallest number of edges in a path from the node to a leaf. Let e be an edge between nodes u 1 and u 2 in a rooted tree T 0. Let T 0 and 1 T 0 2 be the subtrees of T 0 obtained by cutting e which contain u 1 and u 2, respectively. The g-depth of e in T 0 is the larger of the g-depth of u 1 in T 0 and that of 1 u 2 in T 0. 2 The g-depth of a rooted tree is the largest possible g-depth of an edge in the tree. (We add the prex g to the term depth because this usage of depth is nonstandard in graph theory.) Dene d as the g-depth of T. Then d 1 + blog 2 (n? 1)c. As a corollary of 3.2, we obtain the following result Corollary 4.1. For every 2 < d 1 + blog 2 (n? 1)c, there is a tree T with depth d such that any algorithm F needs sample sequences of length 1 ` = f 2 (1? g) 4d to recover T (T ) with probability 1? o (1). Proof. (Sketch.) the proof consists of constructing a tree T such that it has an internal edge e with p e = f, every edge e 0 has p e 0 = g, and d equals the g-depth of the endpoints of e in the subtrees obtained when T is cut at e. The diameter of T is dened as the maximum number of edges in a path between leaves of T. The diameter is always at least as large as 2d and can be even (n). Many distance-based algorithms have sequence length bounds that are exponential in the diameter and therefore in n. Atteson [2] established an O bound on sample length require- log n f 2 (1?g) diam ments of Neighbor-Joining [20] and related algorithms. The algorithms! : e;e 7
8 of Agarwala et al. [1], and Farach and Kannan [12] try to nd a tree T such that the distance matrix D T is close to ^D in the L 1 metric, i.e., max i;j j ^D[i; j]? DT [i; j]j is small. This approach results also in exponential sequence length requirements. In particular, a ` = 1 f 2 (1?g) diam lower bound can be derived [11]. Cryan, Goldberg and Goldberg [6] recently developed an algorithm that outputs a T such that max i;j jd T [i; j]?d T [i; j]j is small. Our lower bound results apply here, as well, when this algorithm is used for topology estimation. However, if instead of recovering the topology, the minimization of max i;j jd T [i; j]? D T [i; j]j is the goal, then the sample length bounds do not depend on f and g. The Short Quartet Method (SQM) [10] is an algorithm based on a greedy selection of quartets of leaves that recovers the correct tree topology with high probability from short sample sequences. In particular, for every T with depth d, there exists an ` = O log n f 2 (1? g) 4d+6 such that the topology is recovered correctly with probability 1? o (1). The Harmonic Greedy Triplets (HGT) [7] algorithm is based on a greedy selection of triplets and successfully recovers T (T ) from sequences of length log n ` = O f 2 (1? g) 4d+8 with probability 1? o (1). In addition, HGT recovers the edge weights with high accuracy, whereas SQM returns only the topology. Both algorithms provide a sample length bound that is close to our lower bound derived for any distance-based algorithm. 5. Summary. We derived lower bounds on sample length requirements of any distance-based algorithm in the Jukes-Cantor model of sequence evolution. We also showed that the usual eviolutionary distance denition is unique in this model and thus no distance-based algorithm can achieve a better performance by using a dierent distance denition. Finally, we showed the two distance-based algorithms, the Short Quartet and the Harmonic Greedy Triplets algorithms match these lower bounds closely and therefore these algorithms oer optimal performance. 8
9 REFERENCES [1] R. Agarwala, V. Bafna, M. Farach, B. Narayanan, M. Paterson, and M. Thorup, On the approximability of numerical taxonomy (tting distances by tree metrics), in Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, Atlanta, Georgia, 28{30 Jan. 1996, pp. 365{372. [2] K. Atteson, The performance of neighbor-joining algorithms of phylogeny reconstruction, in Computing and Combinatorics, Third Annual International Conference, Shanghai, China, T. Jiang and D. T. Lee, eds., vol of Lecture Notes in Computer Science, Berlin, 1997, Springer-Verlag, pp. 101{110. [3] R. N. Bhattachraya and R. Ranga Rao, Normal approximation and asymptotic expansions, John Wiley & Sons, New York, [4] P. Buneman, The recovery of trees from dissimilarity matrices, in Mathematics in the Archaelogical and Historical Sciences, F. R. Hodson, D. G. Kendall, and P. Tautu, eds., Edinburgh University Press, Edinburgh, 1971, pp. 387{395. [5] J. Cavender, Taxonomy with condence, Mathematical Biosciences, 40 (1978), pp. 271{280. [6] M. Cryan, L. A. Goldberg, and P. W. Goldberg, Evolutionary trees can be learned in polynomial time in the two-state general Markov-model, Tech. Report RR347, Department of Computer Science, University of Warwick, UK, preliminary version at FOCS '98. [7] M. Cs}uros and M.-Y. Kao, Recovering evolutionary trees through Harmonic Greedy Triplets, in SODA '99, [8] W. H. E. Day, Computational complexity of inferring phylogenies from dissimilarity matrices, Bulletin of Mathematical Biology, 49 (1987), pp. 461{467. [9] W. H. E. Day, D. S. Johnson, and D. Sankoff, The computational complexity of inferring rooted phylogenies by parsimony, Mathematical Biosciences, 81 (1986), pp. 33{42. [10] P. Erd}os, K. Rice, M. A. Steel, L. A. Szekely, and T. Warnow, The Short Quartet Method, Mathematical Modeling and Scientic Computing, (1998). to appear. [11] P. Erd}os, M. A. Steel, L. A. Szekely, and T. Warnow, A few logs suce to build (almost) all trees (ii), Tech. Report 97-72, DIMACS, [12] M. Farach and S. Kannan, Ecient algorithms for inverting evolution, in Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, 22{24 May 1996, pp. 230{236. [13] J. Felsenstein, Cases in which parsimony or compatibility methods will be positively misleading, Systematic Zoology, 22 (1978), pp. 240{249. [14], Numerical methods for inferring evolutionary trees, The Quarterly Review of Biology, 57 (1982), pp. 379{404. [15] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, Cambridge, UK, [16] T. H. Jukes and C. R. Cantor, Evolution of protein molecules, in Mammalian Protein Metabolism, H. N. Munro, ed., vol. III, Academic Press, New York, 1969, ch. 24, pp. 21{132. [17] J. Neyman, Molecular studies of evolution: a source of novel statistical problems, in Statistical Decision Theory and Related Topics, S. S. Gupta and J. Yackel, eds., Academic Press, New York, 1971, pp. 1{27. [18] M. Noro, R. Masuda, I. A. Dubrovo, M. C. Yoshida, and M. Kato, Molecular 9
10 phylogenetic inference of the Woolly Mammoth mammuthus primigenius, based on complete sequences of mitochondrial cytochrome b and 12S ribosomal RNA genes, Journal of Molecular Evolution, 46 (1998), pp. 314{326. [19] C.-Y. Ou, C. A. Cieselski, G. Myers, C. I. Bandea, C.-C. Luo, B. T. M. Korber, J. I. Mullins, G. Schochetman, R. L. Berkelman, A. N. Economou, J. J. Witte, L. J. Furman, G. A. Satten, K. A. MacInnes, J. W. Curran, and H. W. Jaffe, Molecular epidemiology of HIV transmission in a dental practice, Science, 256 (1992), pp. 1165{1171. [20] N. Saitou and M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Molecular Biology and Evolution, 4 (1987), pp. 406{425. [21] S. Sattath and A. Tversky, Additive similarity trees, Psychometrika, 42 (1977), pp. 319{345. [22] D. D. Sleator and R. E. Tarjan, A data structure for dynamic trees, Journal of Computer and System Sciences, 26 (1983), pp. 362{391. [23] D. L. Swofford, G. J. Olsen, P. J. Waddell, and D. M. Hillis, Phylogenetic inference, in Molecular Systematics, D. M. Hillis, C. Moritz, and B. K. Mable, eds., Sinauer Associates, Inc., Sunderland, Ma, 2nd ed., 1996, ch. 11, pp. 407{
TheDisk-Covering MethodforTree Reconstruction
TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document
More informationPhylogenetic Networks, Trees, and Clusters
Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University
More informationA few logs suce to build (almost) all trees: Part II
Theoretical Computer Science 221 (1999) 77 118 www.elsevier.com/locate/tcs A few logs suce to build (almost) all trees: Part II Peter L. Erdős a;, Michael A. Steel b,laszlo A.Szekely c, Tandy J. Warnow
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationDisk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 6, Numbers 3/4, 1999 Mary Ann Liebert, Inc. Pp. 369 386 Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction DANIEL H. HUSON, 1 SCOTT M.
More informationNJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees
NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana
More information1 Introduction The j-state General Markov Model of Evolution was proposed by Steel in 1994 [14]. The model is concerned with the evolution of strings
Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model Mary Cryan Leslie Ann Goldberg Paul W. Goldberg. July 20, 1998 Abstract The j-state General Markov Model of evolution
More informationReconstructing Trees from Subtree Weights
Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree
More informationTHE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT
COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.
More informationConsistency Index (CI)
Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)
More informationOutput: A tree metric T which spans S and ts D. This denition leaves two points unanswered: rst, what kind of tree metric, and second, what does it me
ON THE APPROXIMABILITY OF NUMERICAL TAXONOMY (FITTING DISTANCES BY TREE METRICS) RICHA AGARWALA, VINEET BAFNA y, MARTIN FARACH z, MIKE PATERSON x, AND MIKKEL THORUP { Abstract. We consider the problem
More informationRecent Advances in Phylogeny Reconstruction
Recent Advances in Phylogeny Reconstruction from Gene-Order Data Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131 Department Colloqium p.1/41 Collaborators
More informationX X (2) X Pr(X = x θ) (3)
Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree
More informationCS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003
CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang
More informationDNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi
DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationA Phylogenetic Network Construction due to Constrained Recombination
A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer
More informationMinimum evolution using ordinary least-squares is less robust than neighbor-joining
Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 5
CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC
More informationKey words. computational learning theory, evolutionary trees, PAC-learning, learning of distributions,
EVOLUTIONARY TREES CAN BE LEARNED IN POLYNOMIAL TIME IN THE TWO-STATE GENERAL MARKOV MODEL MARY CRYAN, LESLIE ANN GOLDBERG, AND PAUL W. GOLDBERG. Abstract. The j-state General Markov Model of evolution
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationLet S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationA Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed
Computer Science Technical Reports Computer Science 3-17-1994 A Faster Algorithm for the Perfect Phylogeny Problem when the Number of Characters is Fixed Richa Agarwala Iowa State University David Fernández-Baca
More informationNeighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances
Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances Ilan Gronau Shlomo Moran September 6, 2006 Abstract Reconstructing phylogenetic trees efficiently and accurately from distance estimates
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationarxiv: v5 [q-bio.pe] 24 Oct 2016
On the Quirks of Maximum Parsimony and Likelihood on Phylogenetic Networks Christopher Bryant a, Mareike Fischer b, Simone Linz c, Charles Semple d arxiv:1505.06898v5 [q-bio.pe] 24 Oct 2016 a Statistics
More informationPhylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz
Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels
More informationc 2001 Society for Industrial and Applied Mathematics
SIAM J. COMPUT. Vol. 31, No. 2, pp. 375 397 c 2001 Society for Industrial and Applied Mathematics EVOLUTIONARY TREES CAN BE LEARNED IN POLYNOMIAL TIME IN THE TWO-STATE GENERAL MARKOV MODEL MARY CRYAN,
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationarxiv: v1 [q-bio.pe] 1 Jun 2014
THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from
More informationReconstruction of certain phylogenetic networks from their tree-average distances
Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationOn the Uniqueness of the Selection Criterion in Neighbor-Joining
Journal of Classification 22:3-15 (2005) DOI: 10.1007/s00357-005-0003-x On the Uniqueness of the Selection Criterion in Neighbor-Joining David Bryant McGill University, Montreal Abstract: The Neighbor-Joining
More informationThe Complexity of Constructing Evolutionary Trees Using Experiments
The Complexity of Constructing Evolutionary Trees Using Experiments Gerth Stlting Brodal 1,, Rolf Fagerberg 1,, Christian N. S. Pedersen 1,, and Anna Östlin2, 1 BRICS, Department of Computer Science, University
More informationAnalytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites
Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites Benny Chor Michael Hendy David Penny Abstract We consider the problem of finding the maximum likelihood rooted tree under
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary
CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch
More informationINVARIANTS STEVEN N. EVANS AND XIAOWEN ZHOU. Abstract. The method of invariants is an approach to the problem of reconstructing
DIFFERENT TREES HAVE DISTINCT PHLOGENETIC INVARIANTS STEVEN N. EVANS AND XIAOWEN ZHOU Abstract. The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationLecture Notes: Markov chains
Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity
More informationThe Generalized Neighbor Joining method
The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More informationPlan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method
Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationLetter to the Editor. Department of Biology, Arizona State University
Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona
More informationk-protected VERTICES IN BINARY SEARCH TREES
k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from
More informationarxiv: v1 [cs.cc] 9 Oct 2014
Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz May 7, 08 arxiv:40.7v [cs.cc] 9 Oct 04 Abstract A ternary
More informationTheory of Evolution Charles Darwin
Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationEstimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057
Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number
More informationTree-average distances on certain phylogenetic networks have their weights uniquely determined
Tree-average distances on certain phylogenetic networks have their weights uniquely determined Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu
More informationRealization Plans for Extensive Form Games without Perfect Recall
Realization Plans for Extensive Form Games without Perfect Recall Richard E. Stearns Department of Computer Science University at Albany - SUNY Albany, NY 12222 April 13, 2015 Abstract Given a game in
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationPhylogeny of Mixture Models
Phylogeny of Mixture Models Daniel Štefankovič Department of Computer Science University of Rochester joint work with Eric Vigoda College of Computing Georgia Institute of Technology Outline Introduction
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationExact Algorithms and Experiments for Hierarchical Tree Clustering
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Exact Algorithms and Experiments for Hierarchical Tree Clustering Jiong Guo Universität des Saarlandes jguo@mmci.uni-saarland.de
More informationPhylogeny Estimation and Hypothesis Testing using Maximum Likelihood
Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3
More informationFast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study
Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study Li-San Wang Robert K. Jansen Dept. of Computer Sciences Section of Integrative Biology University of Texas, Austin,
More informationPhylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches
Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell
More informationStructure-Based Comparison of Biomolecules
Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein
More informationSolving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.
Solving the Maximum Agreement Subtree and Maximum Compatible Tree problems on bounded degree trees LIRMM, Montpellier France 4th July 2006 Introduction The Mast and Mct problems: given a set of evolutionary
More informationRECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION
RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36
More informationIs the equal branch length model a parsimony model?
Table 1: n approximation of the probability of data patterns on the tree shown in figure?? made by dropping terms that do not have the minimal exponent for p. Terms that were dropped are shown in red;
More informationMichael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D
7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationLecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22
Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and
More informationc 1999 Society for Industrial and Applied Mathematics
SIAM J. COMPUT. Vol. 28, No. 3, pp. 1073 1085 c 1999 Society for Industrial and Applied Mathematics ON THE APPROXIMABILITY OF NUMERICAL TAXONOMY (FITTING DISTANCES BY TREE METRICS) RICHA AGARWALA, VINEET
More informationProperties of normal phylogenetic networks
Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is
More informationAlgorithms in Computational Biology (236522) spring 2008 Lecture #1
Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??
More informationImproving divergence time estimation in phylogenetics: more taxa vs. longer sequences
Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal
More informationInferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies
Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa
More informationThe Power of Amnesia: Learning Probabilistic. Automata with Variable Memory Length
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length DANA RON YORAM SINGER NAFTALI TISHBY Institute of Computer Science, Hebrew University, Jerusalem 9904, Israel danar@cs.huji.ac.il
More informationGraphs, permutations and sets in genome rearrangement
ntroduction Graphs, permutations and sets in genome rearrangement 1 alabarre@ulb.ac.be Universite Libre de Bruxelles February 6, 2006 Computers in Scientic Discovery 1 Funded by the \Fonds pour la Formation
More informationDistance Corrections on Recombinant Sequences
Distance Corrections on Recombinant Sequences David Bryant 1, Daniel Huson 2, Tobias Kloepper 2, and Kay Nieselt-Struwe 2 1 McGill Centre for Bioinformatics 3775 University Montréal, Québec, H3A 2B4 Canada
More informationHMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington
More informationReading for Lecture 13 Release v10
Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................
More informationExtracted from a working draft of Goldreich s FOUNDATIONS OF CRYPTOGRAPHY. See copyright notice.
106 CHAPTER 3. PSEUDORANDOM GENERATORS Using the ideas presented in the proofs of Propositions 3.5.3 and 3.5.9, one can show that if the n 3 -bit to l(n 3 ) + 1-bit function used in Construction 3.5.2
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationThe least-squares approach to phylogenetics was first suggested
Combinatorics of least-squares trees Radu Mihaescu and Lior Pachter Departments of Mathematics and Computer Science, University of California, Berkeley, CA 94704; Edited by Peter J. Bickel, University
More informationRECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS
RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS KT Huber, V Moulton, C Semple, and M Steel Department of Mathematics and Statistics University of Canterbury Private Bag 4800 Christchurch,
More informationNOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS
NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS PETER J. HUMPHRIES AND CHARLES SEMPLE Abstract. For two rooted phylogenetic trees T and T, the rooted subtree prune and regraft distance
More informationLecture 11 : Asymptotic Sample Complexity
Lecture 11 : Asymptotic Sample Complexity MATH285K - Spring 2010 Lecturer: Sebastien Roch References: [DMR09]. Previous class THM 11.1 (Strong Quartet Evidence) Let Q be a collection of quartet trees on
More informationAlgebraic Statistics Tutorial I
Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =
More informationPhylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science
Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.
More informationMath 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.
Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 14 March 11 Lecturer: Lior Pachter Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar 14.1 Introduction The goal of today s lecture
More informationOrganisatorische Details
Organisatorische Details Vorlesung: Di 13-14, Do 10-12 in DI 205 Übungen: Do 16:15-18:00 Laborraum Schanzenstrasse Vorwiegend Programmieren in Matlab/Octave Teilnahme freiwillig. Übungsblätter jeweils
More informationPitfalls of Heterogeneous Processes for Phylogenetic Reconstruction
Pitfalls of Heterogeneous Processes for Phylogenetic Reconstruction Daniel Štefankovič Eric Vigoda June 30, 2006 Department of Computer Science, University of Rochester, Rochester, NY 14627, and Comenius
More information3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,
More information