Evolutionary Trees R.C.T. Lee and Chin Lung Lu CS 5313 Algorithms for Molecular Biology Evolutionary Trees p.1 Evolutionary tree To describe the evolutionary relationship among species Root A 3 Bifurcating Speciation Extinct Ancestor A 4 Internal Node A4 A Extinct Ancestor Leaf Extant Species Evolutionary Trees p.
Construction of Evolutionary Tree Man Monkey Dog Horse Donkey Pig Rabbit Kangaroo 3.5 Man Monkey Dog Horse Donkey Pig Rabbit Kangaroo 0 1 13 17 16 13 1 1 0 1 16 15 1 11 13 0 10 8 4 6 7 0 1 5 11 11 0 4 10 1 0 6 7 0 7 Distance Matrix M 0 Kangaroo.5 9 3.5 0.5 5,5 3 1 0.5 0.5 Man Monkey Horse Donkey Dog Pig Rabbit Evolutionary Tree T Evolutionary Trees p.3 Rooted Evolutionary Tree The degree of each internal node i, except the root node. A 1 A A 3 A4 Evolutionary Trees p.4
Unrooted Evolutionary Tree Thedegreeofeachinternalnodeis3. A A 4 A 3 Evolutionary Trees p.5 Number of unrooted trees n = n =3 No of Trees Structure of Trees No of Edges 1 1 1 3 ➊ n =4 3 5 Evolutionary Trees p.6
Number of unrooted trees How to add a new species into the tree? ➋ s 3 If a new species is added to an unrooted tree, the number of edges is increased by. NE(n): number of edges of an unrooted tree with n species By induction, wehavene(n) =n 3. Evolutionary Trees p.7 Number of unrooted trees ➌ TU(n): number of unrooted trees for n species Since NE(n) =n 3, wehave TU(n +1)=(n 3)TU(n) or TU(n) =(n 5)TU(n 1) That is, TU(n) =(n 5)(n 7) 1 Evolutionary Trees p.8
Number of unrooted trees ➍ No of Species No of Unrooted Trees 1 3 1 4 3.. 17 6,190,83,353,69,375 18 191,898,783,96,510,65 19 6,33,659,870,76,850,65 0 1,643,095,476,699,771,875 Evolutionary Trees p.9 Change unrooted into rooted ➊ ➋ root root ➌ root Evolutionary Trees p.10
Number of rooted trees TR(n): number of rooted trees for n species Since there are n 3 edges in every unrooted tree for n species, we have ➊ TR(n) = (n 3)TU(n) =(n 3)(n 5)(n 7) 1 = TU(n +1) Evolutionary Trees p.11 Number of rooted trees ➋ No of Species No of Rooted Trees 1 3 3 4 15.. 17 191,898,783,96,510,65 18 6,33,659,870,76,850,66 19 1,643,095,476,699,771,875 0 8,00,794,53,637,891,559,375 Evolutionary Trees p.1
Evolution Tree Problem Input: A distance matrix M of n species Output: Find an evolution tree T under some criterion Basic criterion: d T (s i,s j ) d M (s i,s j ) d M (s i,s j ): the distance between species s i and s j in M d T (s i,s j ): the distance between s i and s j in T Further criterion: when two species are close to each other in the distance matrix, they should be close in the evolution tree Evolutionary Trees p.13 Criteria of Evolution Tree MINIMAX evolution tree: an evolutionary tree with d T (s i,s j ) d M (s i,s j ) such that max [d T (s i,s j ) d M (s i,s j )] is minimized 1 i<j n MINISUM evolution tree: an evolutionary tree with d T (s i,s j ) d M (s i,s j ) such that 1 i<j n d T (s i,s j ) is minimized MINISIZE evolution tree: an evolutionary tree with d T (s i,s j ) d M (s i,s j ) such that the total length of the tree is minimized Evolutionary Trees p.14
Complexities of Evolutionary Tree MINIMAX MINISUM MINISIZE Rooted O(n ) NP-C NP-c Unrooted NP-C NP-C? It is still unknown whether the unrooted MINISIZE evolutionary tree problem is NP-complete. Evolutionary Trees p.15 Minimax rooted tree algorithm Input: a distance matrix of a set S of n species 1. If S contains only one species, return it as the tree.. Find a minimal spanning tree T of S. 3. Find the longest d(s i,s j ) in the distance matrix. Find the longest edge e in the path linking s i and s j in T. Let S i and S j be the two sets of species obtained by breaking edge e. 4. Use this algorithm recursively to find subtrees T i and T j for S i and S j respectively. 5. Construct a rooted tree with T i and T j as subtrees such that dt(s i,s j )=d(s i,s j ). Evolutionary Trees p.16
Basic priciple of Minimax Let s i and s j be the two species which have the longest distance in the distance matrix. 1 d(s i,s j ) T i T j s i s j The longest distance is exactly preserved. Evolutionary Trees p.17 Example of Minimax algorithm 1. Consider the distance matrix as follows: ➊ 0 3 3.1 0 3.6 5 0 1 0. Construct a minimal spanning tree T as follows: 3 1 Evolutionary Trees p.18
Example of Minimax algorithm 3. The distance between and is the longest. ➋ 0 3 3.1 0 3.6 5 0 1 0 4. The path linking and in T in which (, ) is the longest edge. 3 1 Evolutionary Trees p.19 Example of Minimax algorithm 5. Breaking (, ) obtains two subsets of species 1 ➌ S 1 S 6. Construct two subtrees T 1 and T for S 1 and S respectively 1 1 0.5 0.5 T 1 T Evolutionary Trees p.0
Example of Minimax algorithm 7. Combine T 1 and T by making sure that dt(, )=d(, )=5 ➍ 1.5 1 1 0.5 0.5 Evolutionary Trees p.1 Determination of edge weights How to determine the edge weights if the evolutionary tree structure is given? Example: Givena distance matrix and an unrooted evolutionary tree, determne the edge weights such that the tree size is minimized x 1 x 3 x 4 x x 5 Evolutionary Trees p.
Determination of edge weights x 4 x 5 Determine x i by linear programming Minimize x 1 + x + x 3 + x 4 + x 5 x 1 + x d 1 x 3 x 1 x Subject to x 1 + x 3 + x 4 d 13 x 1 + x 3 + x 5 d 14 x + x 3 + x 4 d 3 Unrooted Tree x + x 3 + x 5 d 4 x 4 + x 5 d 34 Evolutionary Trees p.3 Determination of edge weights Minimize x 1 + x + x 3 + x 4 + x 5 + x 6 Subject to x 1 + x d 1 x 5 x 6 x 1 x x 3 x 4 Rooted Tree x 1 + x 5 + x 6 + x 3 d 13 x 1 + x 5 + x 6 + x 4 d 14 x + x 5 + x 6 + x 3 d 3 x + x 5 + x 6 + x 4 d 4 x 3 + x 4 d 34 x 5 + x 1 = x 5 + x = x 6 + x 3 = x 6 + x 4 Evolutionary Trees p.4
UPGMA: create a rooted tree UPGMA: Unweighted Pair Group Method with Arithmetic Mean Example: consider the distance matrix as follows 0 4 4 3 0 6 5 0 0 Output: a rooted evolutionary tree ➊ Evolutionary Trees p.5 UPGMA: create a rooted tree ➋ 1. Select the pair of species (, ) with the smallest distance andconstructarootedtreewith and as leaf nodes (, ) 1 d(, )=1 Evolutionary Trees p.6
UPGMA: create a rooted tree. Consider (, ) as a new species and the distances are updated as follows: d(, (, )) = 1 (d(, )+d(, )) = 1 (4 + 3) = 3.5 d(, (, )) = 1 (d(, )+d(, )) = 1 (6 + 5) = 5.5 (, ) 0 4 3.5 0 5.5 (, ) 0 ➌ Evolutionary Trees p.7 UPGMA: create a rooted tree ➍ 3. Select the pair of species (, (, )) with the smallest distance and construct a rooted as follows: (, (, )) (, ) 0.75 1 d(, (, )) = 1.75 1 Evolutionary Trees p.8
UPGMA: create a rooted tree 4. Since istheonlyspecieleft,thefinaltreewill look like as follows: ➎ 0.75 0.75 1.75 1 d(, (, (, ))) = 4+5+6 3 =.5 1 Evolutionary Trees p.9 Neighbor joining: unrooted tree ➊ Construct a starlike tree as follows: L 1X L X X L 5X s 5 L 3X L 4X Tree length n S 0 = L ix = 1 n 1 i=1 1 i<j n d(s i,s j ) Evolutionary Trees p.30
Neighbor joining: unrooted tree ➋ Consider the tree T 1 as follows: L 1Y L 3X Y L XY X L 4X L Y L 5X s 5 [ 1 n L XY = (d(,s k )+d(,s k )) (n ) k=3 ] n (n )(L 1Y + L Y ) L ix i=3 Evolutionary Trees p.31 Neighbor joining: unrooted tree ➌ ThetreelengthS 1 of T 1 is: S 1 = L XY + (L 1Y + L Y ) + = 1 n i=3 L ix n (n ) k=3 (d(,s k )+d(,s k )) + 1 d(, )+ 1 n 3 i<j n d(s i,s j ) L 1Y L 3X Y L XY X L 4X L Y L 5X s 5 Evolutionary Trees p.3
Neighbor joining: unrooted tree ➍ Consider (, ) as a new speices s (1) and the distances are updated as follows: for 3 i n d(s (1),s i )= d(,s i )+d(,s i ) L 3X s (1) L (1)X X L 4X L 5X s 5 Evolutionary Trees p.33 Neighbor joining: algorithm ➎ 1. For all pairs of s i and s j, we compute the tree size S ij of tree T ij.( n(n 1) such trees). Choose the pair with the smallest tree size. Suppose S 1 is such a pair. Then consider (, ) as a new speices s (1) and the distances are updated as follows: d(s (1),s i )= d(,s i )+d(,s i ) for 3 i n 3. The number of species is reduced by one and for the new distance matrix, the above procedure is again applied to find the next pair of neighbors until the number of species become. Evolutionary Trees p.34
Neighbor joining: example Example: consider the distance matrix as follows ➏ d(s i,s j ) 0 4 4 3 0 6 5 0 0 L X X L 3X L 1X L 4X Evolutionary Trees p.35 Neighbor joining: example ➐ S ij 7.5 8.5 8.5 7.75 8.5 7.5 L 1Y L 3X Y L XX X L Y L 4X Evolutionary Trees p.36
Neighbor joining: example ➑ d(s i,s j ) s (1) 4+6 s (1) 0 =5 3+5 =4 0 0 L 3X s (1) L (1)X X L 4X Evolutionary Trees p.37 Estimate edge lengths Fitch and Margoliash s (1967) mehtod: x z s y d(, )=x + y, d(, )=x + z, d(, )=y + z x = d(, )+d(, ) d(, ) y = d(, )+d(, ) d(, ) z = d(, )+d(, ) d(, ) Evolutionary Trees p.38
Estimate edge lengths L 1Y1 = d(, )+d(,s (345) ) d(,s (345) ) d(,s (345) )= d(, )+d(, )+d(,s 5 ) 3 d(,s (345) )= d(, )+d(, )+d(,s 5 ) 3 L Y1 = d(, )+d(,s (345) ) d(,s (345) ) L 1Y1 L 3X Y1 L XY1 X L 4X s (345) L Y1 L 5X s 5 Evolutionary Trees p.39 Estimate edge lengths Compute L (1)Y and L 3Y using the same way. s (1) L 1Y1 L Y1 Y 1 L (1)Y Y L XY X L 4X s (45) L 3Y L 5X s 5 L (1)Y = L 1Y 1 +L Y1 Y +L Y1 +L Y1 Y = L Y1 Y + L 1Y 1 +L Y1 = L Y1 Y + d(, ) L Y1 Y = L (1)Y d(, ) Evolutionary Trees p.40
Unrooted minisize: approximation How to design a -approximation algorithm for the unrooted minisize evolution tree problem? d(, ) 0 4 4 3 0 6 5 0 0 3 x 1 x 0 0 Evolutionary Trees p.41 Unrooted minisize: approximation Construct a minimal spanning tree out of the distance matrix d(, ) 0 4 4 3 0 6 5 0 0 4 6 3 5 4 Evolutionary Trees p.4
Unrooted minisize: approximation Conduct a bradth first search on the minimal spaning tree by choosing an arbitrary node as root Start from the root and visit all of the first level descendants first, then the second level descendants, and so on. 4 6 3 5 4 ➋ ➊ ➌ ➍ Evolutionary Trees p.43 Unrooted minisize: approximation Transform the minimal spanning tree into an evolutionary tree by adding nodes one by one ➊ ➋ ➌ 3 x 1 x 1 x 3 0 0 0 Evolutionary Trees p.44
Unrooted minisize: approximation The created tree above is indeed an evolution tree: Each species is at leaf. Thedegreeofeachinternalnodeisthree. For any two species s i and s j,wehave d T (s i,s j ) d(s i,s j ) because the distance d T (s i,s j ) between any two species s i and s j on the evolution tree is exactly the same as that on the minimal spanning tree, which is then d(s i,s j ) by triangular inequality Evolutionary Trees p.45 Unrooted minisize: approximation How to prove AP P < OP T? AP P : the length of our approximate solution OP T : an optimal solution 1. AP P = MST < TSP MST: the mininal spanning tree of distance matrix TSP: the optimal solution of the travelling salesperson problem for distance matrix. TSP OP T Evolutionary Trees p.46
Unrooted minisize: approximation AP P = MST < TSP By construction, AP P = MST TSP: findahamiltonian cycle (visiting all of the nodes exactly once) with minimum length Deleting any edge from TSP obtains a spanning tree 4 6 5 3 4 Evolutionary Trees p.47 Unrooted minisize: approximation TSP OP T ET: aneuler tour (visiting all of edges exactly once) ofopt( ET = OP T ) CET: the cycle of spcies corresponding to ET ( TSP CET CET is a Hamilt. cycle) CET ET since d T (s i,s j ) d(s i,s j ) 3 4 1 Evolutionary Trees p.48