Evolutionary Trees. Evolutionary tree. To describe the evolutionary relationship among species A 3 A 2 A 4. R.C.T. Lee and Chin Lung Lu

Similar documents
Phylogenetic trees 07/10/13

Evolutionary Tree Analysis. Overview

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Phylogenetic Tree Reconstruction

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

July 18, Approximation Algorithms (Travelling Salesman Problem)

BINF6201/8201. Molecular phylogenetic methods

VIII. NP-completeness


What is Phylogenetics

Algorithms in Bioinformatics

from notes written mostly by Dr. Matt Stallmann: All Rights Reserved

Decision Problems TSP. Instance: A complete graph G with non-negative edge costs, and an integer

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Theory of Evolution Charles Darwin

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

NP and Computational Intractability

Algorithm Design Strategies V

ABHELSINKI UNIVERSITY OF TECHNOLOGY

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

ECS122A Handout on NP-Completeness March 12, 2018

Phylogeny: traditional and Bayesian approaches

CS/COE

Week 5: Distance methods, DNA and protein models

UNIVERSITY OF YORK. MSc Examinations 2004 MATHEMATICS Networks. Time Allowed: 3 hours.

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Organisatorische Details

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

EVOLUTIONARY DISTANCES

Theory of Evolution. Charles Darwin

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Constructing Evolutionary/Phylogenetic Trees

Exploring Treespace. Katherine St. John. Lehman College & the Graduate Center. City University of New York. 20 June 2011

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

SAT, Coloring, Hamiltonian Cycle, TSP

Phylogenetic inference: from sequences to trees

CS 220: Discrete Structures and their Applications. Mathematical Induction in zybooks

CS483 Design and Analysis of Algorithms

Algorithms Design & Analysis. Approximation Algorithm

Multiple Sequence Alignment. Sequences

Molecular Evolution and Phylogenetic Tree Reconstruction

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

1 Heuristics for the Traveling Salesman Problem

CS 581 Paper Presentation

3.4 Relaxations and bounds

University of New Mexico Department of Computer Science. Final Examination. CS 362 Data Structures and Algorithms Spring, 2007

Phylogeny. November 7, 2017

CS60007 Algorithm Design and Analysis 2018 Assignment 1

15.081J/6.251J Introduction to Mathematical Programming. Lecture 24: Discrete Optimization

Hamiltonian Graphs Graphs

Phylogeny: building the tree of life

Discrete Wiskunde II. Lecture 5: Shortest Paths & Spanning Trees

D1 Discrete Mathematics The Travelling Salesperson problem. The Nearest Neighbour Algorithm The Lower Bound Algorithm The Tour Improvement Algorithm

Dynamic Programming: Shortest Paths and DFA to Reg Exps

Lecture 4: NP and computational intractability

Fall 2003 BMI 226 / CS 426 LIMITED VALIDITY STRUCTURES

University of Washington March 21, 2013 Department of Computer Science and Engineering CSEP 521, Winter Exam Solution, Monday, March 18, 2013

Dynamic Programming: Shortest Paths and DFA to Reg Exps

Algorithms and Theory of Computation. Lecture 22: NP-Completeness (2)

Graph Transformations T1 and T2

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Building Phylogenetic Trees UPGMA & NJ

8. INTRACTABILITY I. Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley. Last updated on 2/6/18 2:16 AM

X. Hu, R. Shonkwiler, and M.C. Spruill. School of Mathematics. Georgia Institute of Technology. Atlanta, GA 30332

Graduate Algorithms CS F-21 NP & Approximation Algorithms

1 Review of Vertex Cover

Proof of Theorem 1. Tao Lei CSAIL,MIT. Here we give the proofs of Theorem 1 and other necessary lemmas or corollaries.

Traveling Salesman Problem

Walks in Phylogenetic Treespace

Mechanisms of Evolution Darwinian Evolution

Phylogenetics: Parsimony

Efficient parallel query processing by graph ranking

I wonder about Trees Robert Frost

Who Has Heard of This Problem? Courtesy: Jeremy Kun

2 Notation and Preliminaries

Hierarchical Clustering

Reconstructibility of trees from subtree size frequencies

Improved maximum parsimony models for phylogenetic networks

Today s Outline. CS 362, Lecture 24. The Reduction. Hamiltonian Cycle. Reduction Wrapup Approximation algorithms for NP-Hard Problems

NATIONAL UNIVERSITY OF SINGAPORE CS3230 DESIGN AND ANALYSIS OF ALGORITHMS SEMESTER II: Time Allowed 2 Hours

The Complexity of Constructing Evolutionary Trees Using Experiments

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

The minimum G c cut problem

Worst Case Performance of Approximation Algorithm for Asymmetric TSP

10.4 The Kruskal Katona theorem

Algorithm Design and Analysis

Distribution of the Number of Encryptions in Revocation Schemes for Stateless Receivers

Part III: Traveling salesman problems

Improved methods for the Travelling Salesperson with Hotel Selection

Discrete Optimization 2010 Lecture 10 P, N P, and N PCompleteness

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

Phylogenetics: Building Phylogenetic Trees

Combinatorial Optimization

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth

ACO Comprehensive Exam October 18 and 19, Analysis of Algorithms

Transcription:

Evolutionary Trees R.C.T. Lee and Chin Lung Lu CS 5313 Algorithms for Molecular Biology Evolutionary Trees p.1 Evolutionary tree To describe the evolutionary relationship among species Root A 3 Bifurcating Speciation Extinct Ancestor A 4 Internal Node A4 A Extinct Ancestor Leaf Extant Species Evolutionary Trees p.

Construction of Evolutionary Tree Man Monkey Dog Horse Donkey Pig Rabbit Kangaroo 3.5 Man Monkey Dog Horse Donkey Pig Rabbit Kangaroo 0 1 13 17 16 13 1 1 0 1 16 15 1 11 13 0 10 8 4 6 7 0 1 5 11 11 0 4 10 1 0 6 7 0 7 Distance Matrix M 0 Kangaroo.5 9 3.5 0.5 5,5 3 1 0.5 0.5 Man Monkey Horse Donkey Dog Pig Rabbit Evolutionary Tree T Evolutionary Trees p.3 Rooted Evolutionary Tree The degree of each internal node i, except the root node. A 1 A A 3 A4 Evolutionary Trees p.4

Unrooted Evolutionary Tree Thedegreeofeachinternalnodeis3. A A 4 A 3 Evolutionary Trees p.5 Number of unrooted trees n = n =3 No of Trees Structure of Trees No of Edges 1 1 1 3 ➊ n =4 3 5 Evolutionary Trees p.6

Number of unrooted trees How to add a new species into the tree? ➋ s 3 If a new species is added to an unrooted tree, the number of edges is increased by. NE(n): number of edges of an unrooted tree with n species By induction, wehavene(n) =n 3. Evolutionary Trees p.7 Number of unrooted trees ➌ TU(n): number of unrooted trees for n species Since NE(n) =n 3, wehave TU(n +1)=(n 3)TU(n) or TU(n) =(n 5)TU(n 1) That is, TU(n) =(n 5)(n 7) 1 Evolutionary Trees p.8

Number of unrooted trees ➍ No of Species No of Unrooted Trees 1 3 1 4 3.. 17 6,190,83,353,69,375 18 191,898,783,96,510,65 19 6,33,659,870,76,850,65 0 1,643,095,476,699,771,875 Evolutionary Trees p.9 Change unrooted into rooted ➊ ➋ root root ➌ root Evolutionary Trees p.10

Number of rooted trees TR(n): number of rooted trees for n species Since there are n 3 edges in every unrooted tree for n species, we have ➊ TR(n) = (n 3)TU(n) =(n 3)(n 5)(n 7) 1 = TU(n +1) Evolutionary Trees p.11 Number of rooted trees ➋ No of Species No of Rooted Trees 1 3 3 4 15.. 17 191,898,783,96,510,65 18 6,33,659,870,76,850,66 19 1,643,095,476,699,771,875 0 8,00,794,53,637,891,559,375 Evolutionary Trees p.1

Evolution Tree Problem Input: A distance matrix M of n species Output: Find an evolution tree T under some criterion Basic criterion: d T (s i,s j ) d M (s i,s j ) d M (s i,s j ): the distance between species s i and s j in M d T (s i,s j ): the distance between s i and s j in T Further criterion: when two species are close to each other in the distance matrix, they should be close in the evolution tree Evolutionary Trees p.13 Criteria of Evolution Tree MINIMAX evolution tree: an evolutionary tree with d T (s i,s j ) d M (s i,s j ) such that max [d T (s i,s j ) d M (s i,s j )] is minimized 1 i<j n MINISUM evolution tree: an evolutionary tree with d T (s i,s j ) d M (s i,s j ) such that 1 i<j n d T (s i,s j ) is minimized MINISIZE evolution tree: an evolutionary tree with d T (s i,s j ) d M (s i,s j ) such that the total length of the tree is minimized Evolutionary Trees p.14

Complexities of Evolutionary Tree MINIMAX MINISUM MINISIZE Rooted O(n ) NP-C NP-c Unrooted NP-C NP-C? It is still unknown whether the unrooted MINISIZE evolutionary tree problem is NP-complete. Evolutionary Trees p.15 Minimax rooted tree algorithm Input: a distance matrix of a set S of n species 1. If S contains only one species, return it as the tree.. Find a minimal spanning tree T of S. 3. Find the longest d(s i,s j ) in the distance matrix. Find the longest edge e in the path linking s i and s j in T. Let S i and S j be the two sets of species obtained by breaking edge e. 4. Use this algorithm recursively to find subtrees T i and T j for S i and S j respectively. 5. Construct a rooted tree with T i and T j as subtrees such that dt(s i,s j )=d(s i,s j ). Evolutionary Trees p.16

Basic priciple of Minimax Let s i and s j be the two species which have the longest distance in the distance matrix. 1 d(s i,s j ) T i T j s i s j The longest distance is exactly preserved. Evolutionary Trees p.17 Example of Minimax algorithm 1. Consider the distance matrix as follows: ➊ 0 3 3.1 0 3.6 5 0 1 0. Construct a minimal spanning tree T as follows: 3 1 Evolutionary Trees p.18

Example of Minimax algorithm 3. The distance between and is the longest. ➋ 0 3 3.1 0 3.6 5 0 1 0 4. The path linking and in T in which (, ) is the longest edge. 3 1 Evolutionary Trees p.19 Example of Minimax algorithm 5. Breaking (, ) obtains two subsets of species 1 ➌ S 1 S 6. Construct two subtrees T 1 and T for S 1 and S respectively 1 1 0.5 0.5 T 1 T Evolutionary Trees p.0

Example of Minimax algorithm 7. Combine T 1 and T by making sure that dt(, )=d(, )=5 ➍ 1.5 1 1 0.5 0.5 Evolutionary Trees p.1 Determination of edge weights How to determine the edge weights if the evolutionary tree structure is given? Example: Givena distance matrix and an unrooted evolutionary tree, determne the edge weights such that the tree size is minimized x 1 x 3 x 4 x x 5 Evolutionary Trees p.

Determination of edge weights x 4 x 5 Determine x i by linear programming Minimize x 1 + x + x 3 + x 4 + x 5 x 1 + x d 1 x 3 x 1 x Subject to x 1 + x 3 + x 4 d 13 x 1 + x 3 + x 5 d 14 x + x 3 + x 4 d 3 Unrooted Tree x + x 3 + x 5 d 4 x 4 + x 5 d 34 Evolutionary Trees p.3 Determination of edge weights Minimize x 1 + x + x 3 + x 4 + x 5 + x 6 Subject to x 1 + x d 1 x 5 x 6 x 1 x x 3 x 4 Rooted Tree x 1 + x 5 + x 6 + x 3 d 13 x 1 + x 5 + x 6 + x 4 d 14 x + x 5 + x 6 + x 3 d 3 x + x 5 + x 6 + x 4 d 4 x 3 + x 4 d 34 x 5 + x 1 = x 5 + x = x 6 + x 3 = x 6 + x 4 Evolutionary Trees p.4

UPGMA: create a rooted tree UPGMA: Unweighted Pair Group Method with Arithmetic Mean Example: consider the distance matrix as follows 0 4 4 3 0 6 5 0 0 Output: a rooted evolutionary tree ➊ Evolutionary Trees p.5 UPGMA: create a rooted tree ➋ 1. Select the pair of species (, ) with the smallest distance andconstructarootedtreewith and as leaf nodes (, ) 1 d(, )=1 Evolutionary Trees p.6

UPGMA: create a rooted tree. Consider (, ) as a new species and the distances are updated as follows: d(, (, )) = 1 (d(, )+d(, )) = 1 (4 + 3) = 3.5 d(, (, )) = 1 (d(, )+d(, )) = 1 (6 + 5) = 5.5 (, ) 0 4 3.5 0 5.5 (, ) 0 ➌ Evolutionary Trees p.7 UPGMA: create a rooted tree ➍ 3. Select the pair of species (, (, )) with the smallest distance and construct a rooted as follows: (, (, )) (, ) 0.75 1 d(, (, )) = 1.75 1 Evolutionary Trees p.8

UPGMA: create a rooted tree 4. Since istheonlyspecieleft,thefinaltreewill look like as follows: ➎ 0.75 0.75 1.75 1 d(, (, (, ))) = 4+5+6 3 =.5 1 Evolutionary Trees p.9 Neighbor joining: unrooted tree ➊ Construct a starlike tree as follows: L 1X L X X L 5X s 5 L 3X L 4X Tree length n S 0 = L ix = 1 n 1 i=1 1 i<j n d(s i,s j ) Evolutionary Trees p.30

Neighbor joining: unrooted tree ➋ Consider the tree T 1 as follows: L 1Y L 3X Y L XY X L 4X L Y L 5X s 5 [ 1 n L XY = (d(,s k )+d(,s k )) (n ) k=3 ] n (n )(L 1Y + L Y ) L ix i=3 Evolutionary Trees p.31 Neighbor joining: unrooted tree ➌ ThetreelengthS 1 of T 1 is: S 1 = L XY + (L 1Y + L Y ) + = 1 n i=3 L ix n (n ) k=3 (d(,s k )+d(,s k )) + 1 d(, )+ 1 n 3 i<j n d(s i,s j ) L 1Y L 3X Y L XY X L 4X L Y L 5X s 5 Evolutionary Trees p.3

Neighbor joining: unrooted tree ➍ Consider (, ) as a new speices s (1) and the distances are updated as follows: for 3 i n d(s (1),s i )= d(,s i )+d(,s i ) L 3X s (1) L (1)X X L 4X L 5X s 5 Evolutionary Trees p.33 Neighbor joining: algorithm ➎ 1. For all pairs of s i and s j, we compute the tree size S ij of tree T ij.( n(n 1) such trees). Choose the pair with the smallest tree size. Suppose S 1 is such a pair. Then consider (, ) as a new speices s (1) and the distances are updated as follows: d(s (1),s i )= d(,s i )+d(,s i ) for 3 i n 3. The number of species is reduced by one and for the new distance matrix, the above procedure is again applied to find the next pair of neighbors until the number of species become. Evolutionary Trees p.34

Neighbor joining: example Example: consider the distance matrix as follows ➏ d(s i,s j ) 0 4 4 3 0 6 5 0 0 L X X L 3X L 1X L 4X Evolutionary Trees p.35 Neighbor joining: example ➐ S ij 7.5 8.5 8.5 7.75 8.5 7.5 L 1Y L 3X Y L XX X L Y L 4X Evolutionary Trees p.36

Neighbor joining: example ➑ d(s i,s j ) s (1) 4+6 s (1) 0 =5 3+5 =4 0 0 L 3X s (1) L (1)X X L 4X Evolutionary Trees p.37 Estimate edge lengths Fitch and Margoliash s (1967) mehtod: x z s y d(, )=x + y, d(, )=x + z, d(, )=y + z x = d(, )+d(, ) d(, ) y = d(, )+d(, ) d(, ) z = d(, )+d(, ) d(, ) Evolutionary Trees p.38

Estimate edge lengths L 1Y1 = d(, )+d(,s (345) ) d(,s (345) ) d(,s (345) )= d(, )+d(, )+d(,s 5 ) 3 d(,s (345) )= d(, )+d(, )+d(,s 5 ) 3 L Y1 = d(, )+d(,s (345) ) d(,s (345) ) L 1Y1 L 3X Y1 L XY1 X L 4X s (345) L Y1 L 5X s 5 Evolutionary Trees p.39 Estimate edge lengths Compute L (1)Y and L 3Y using the same way. s (1) L 1Y1 L Y1 Y 1 L (1)Y Y L XY X L 4X s (45) L 3Y L 5X s 5 L (1)Y = L 1Y 1 +L Y1 Y +L Y1 +L Y1 Y = L Y1 Y + L 1Y 1 +L Y1 = L Y1 Y + d(, ) L Y1 Y = L (1)Y d(, ) Evolutionary Trees p.40

Unrooted minisize: approximation How to design a -approximation algorithm for the unrooted minisize evolution tree problem? d(, ) 0 4 4 3 0 6 5 0 0 3 x 1 x 0 0 Evolutionary Trees p.41 Unrooted minisize: approximation Construct a minimal spanning tree out of the distance matrix d(, ) 0 4 4 3 0 6 5 0 0 4 6 3 5 4 Evolutionary Trees p.4

Unrooted minisize: approximation Conduct a bradth first search on the minimal spaning tree by choosing an arbitrary node as root Start from the root and visit all of the first level descendants first, then the second level descendants, and so on. 4 6 3 5 4 ➋ ➊ ➌ ➍ Evolutionary Trees p.43 Unrooted minisize: approximation Transform the minimal spanning tree into an evolutionary tree by adding nodes one by one ➊ ➋ ➌ 3 x 1 x 1 x 3 0 0 0 Evolutionary Trees p.44

Unrooted minisize: approximation The created tree above is indeed an evolution tree: Each species is at leaf. Thedegreeofeachinternalnodeisthree. For any two species s i and s j,wehave d T (s i,s j ) d(s i,s j ) because the distance d T (s i,s j ) between any two species s i and s j on the evolution tree is exactly the same as that on the minimal spanning tree, which is then d(s i,s j ) by triangular inequality Evolutionary Trees p.45 Unrooted minisize: approximation How to prove AP P < OP T? AP P : the length of our approximate solution OP T : an optimal solution 1. AP P = MST < TSP MST: the mininal spanning tree of distance matrix TSP: the optimal solution of the travelling salesperson problem for distance matrix. TSP OP T Evolutionary Trees p.46

Unrooted minisize: approximation AP P = MST < TSP By construction, AP P = MST TSP: findahamiltonian cycle (visiting all of the nodes exactly once) with minimum length Deleting any edge from TSP obtains a spanning tree 4 6 5 3 4 Evolutionary Trees p.47 Unrooted minisize: approximation TSP OP T ET: aneuler tour (visiting all of edges exactly once) ofopt( ET = OP T ) CET: the cycle of spcies corresponding to ET ( TSP CET CET is a Hamilt. cycle) CET ET since d T (s i,s j ) d(s i,s j ) 3 4 1 Evolutionary Trees p.48