BIOINFORMATICS GABRIEL VALIENTE ALGORITHMS, BIOINFORMATICS, COMPLEXITY AND FORMAL METHODS RESEARCH GROUP, TECHNICAL UNIVERSITY OF CATALONIA

Size: px
Start display at page:

Download "BIOINFORMATICS GABRIEL VALIENTE ALGORITHMS, BIOINFORMATICS, COMPLEXITY AND FORMAL METHODS RESEARCH GROUP, TECHNICAL UNIVERSITY OF CATALONIA"

Transcription

1 BIOINFORMATICS GABRIEL VALIENTE ALGORITHMS, BIOINFORMATICS, COMPLEXITY AND FORMAL METHODS RESEARCH GROUP, TECHNICAL UNIVERSITY OF CATALONIA Gabriel Valiente (ALBCOM) Bioinformatics / 86

2 Introduction April 27 Ultrametric trees Phylogenetic reconstruction May 4 Additive and non-additive trees May 11 May 18 Perfect phylogenies Compatibility Taxonomic reconstruction May 25 Consensus June 1 Combination Gabriel Valiente (ALBCOM) Bioinformatics / 86

3 Introduction Michael S. Waterman (University of Southern California). Introduction to Computational Biology. Chapman & Hall, Dan Gusfield (University of California, Davis). Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, Roderic D. M. Page (University of Glasgow) and Edward C. Holmes (University of Oxford). Molecular Evolution: A Phylogenetic Approach. Blackwell Science, Gabriel Valiente (Technical University of Catalonia). Algorithms on Trees and Graphs. Springer-Verlag, Neil C. Jones, Pavel A. Pevzner (University of California, San Diego). An Introduction to Bioinformatics Algorithms. The MIT Press, Arthur M. Lesk (Pennsylvania State University). Introduction to Bioinformatics. 2nd Edition. Oxford University Press Gabriel Valiente (ALBCOM) Bioinformatics / 86

4 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86

5 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86

6 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86

7 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86

8 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86

9 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged MYA Brown Bear Polar Bear Black Bear Spectacled Bear Giant Panda Raccoon Red Panda No correlation between evolutionary distances and edge lengths Gabriel Valiente (ALBCOM) Bioinformatics / 86

10 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged MYA Brown Bear Polar Bear Black Bear Spectacled Bear Giant Panda Raccoon Red Panda No correlation between evolutionary distances and edge lengths Gabriel Valiente (ALBCOM) Bioinformatics / 86

11 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged 40 MYA Brown Bear Polar Bear Black Bear Spectacled Bear Giant Panda Raccoon Red Panda Gabriel Valiente (ALBCOM) Bioinformatics / 86

12 Ultrametric trees Given a weighted tree T with n leaves, compute the length d T i,j of the path between any two leaves i and j The length of the path between any two nodes can be calculated as the sum of the weights of the edges in the path between them For example, d1,5 T = = 68 Gabriel Valiente (ALBCOM) Bioinformatics / 86

13 Ultrametric trees Given an n n distance matrix D, find a tree T with n leaves that fits the data, that is, such that di,j T = D i,j for every two leaves i and j A matrix D if symmetric non-negative if D i,j = D j,i and D i,j 0 for all i and j A matrix D satisfies the triangle inequality if D i,j + D j,k D i,k for all i, j, and k A matrix D is a distance matrix if it is symmetric non-negative, it satisfies the triangle inequality, and D i,j 0 for all i j Gabriel Valiente (ALBCOM) Bioinformatics / 86

14 Ultrametric trees There are many ways in which distance matrices can be generated Sequence a particular gene in n species and define D i,j as the edit distance between this gene in species i and species j Sequence a particular gene in n species and define D i,j as the alignment distance between this gene in species i and species j Gabriel Valiente (ALBCOM) Bioinformatics / 86

15 Ultrametric trees Given an n n distance matrix D, find a tree T with n leaves that fits the data, that is, such that d T i,j = D i,j for every two leaves i and j There is only one unrooted binary tree topology T with n = 3 leaves i d T i,c c d T k,c The lengths of each edge in T are defined by three equations with three variables d T i,c + d T j,c = D i,j d T i,c + d T k,c = D i,k d T j,c + d T k,c = D j,k k d T j,c j d T i,c = (D i,j + D i,k D j,k )/2 d T j,c = (D i,j + D j,k D i,k )/2 d T k,c = (D i,k + D j,k D i,j )/2 Gabriel Valiente (ALBCOM) Bioinformatics / 86

16 Ultrametric trees Given an n n distance matrix D, find a tree T with n leaves that fits the data, that is, such that d T i,j = D i,j for every two leaves i and j An unrooted binary tree with n leaves has 2n 3 edges Fitting any given tree T with n leaves to an n n distance matrix D involves solving a system of ( n 2) equations with 2n 3 variables For n = 4, this amounts to solving a system of six equations with only five variables, and it is not always possible to solve this system, making it hard or impossible to construct such a tree T from D Gabriel Valiente (ALBCOM) Bioinformatics / 86

17 Ultrametric trees A distance matrix D is ultrametric if for every three leaves i, j, and k, of the three distances D i,j D i,k D j,k the two largest are equal (three point condition) D i,j D i,k = D j,k D i,k D i,j = D j,k D j,k D i,j = D i,k i j a a b k D i,j = 2a a + b = D i,k = D j,k implies a b It can be determined in O(n 3 ) time whether or not an n n distance matrix D is ultrametric Gabriel Valiente (ALBCOM) Bioinformatics / 86

18 Ultrametric trees Ultrametric distance matrices model evolutionary trees An evolutionary tree is a rooted binary tree with internal nodes labeled by a number and with strictly decreasing labels along any root-to-leaf path For every two leaves i and j, their distance D i,j is the label of the least common ancestor of species i and j A B C D E F G Gabriel Valiente (ALBCOM) Bioinformatics / 86

19 Ultrametric trees Unweighted Pair Group Method with Arithmetic Mean is an algorithm for reconstructing a tree T from an ultrametric distance matrix D P. H. A. Sneath and R. R. Sokal. Numerical Taxonomy: The Principles and Practice of Numerical Classification. W. H. Freeman, San Francisco, 1973 Starting with n clusters of one element each, merge the two closest clusters until only a single cluster remains The distance between two disjoint clusters C i and C j is defined as the average inter-cluster pairwise distance, D(C i, C j ) = 1 C i C j D i,j i C i j C j The length of an edge (u, v) is defined as the difference in heights of the vertices u and v The height plays the role of the molecular clock, and allows one to date the divergence point for every vertex in the evolutionary tree Gabriel Valiente (ALBCOM) Bioinformatics / 86

20 Ultrametric trees Unweighted Pair Group Method with Arithmetic Mean is an algorithm for reconstructing a tree T from an ultrametric distance matrix D Form n clusters, each with a single element Construct a graph T with a vertex v of height h(v) = 0 for each cluster while there is more than one cluster do Find the two closest clusters C i and C j Merge C i and C j into a new cluster C for every cluster C C do Set D(C, C ) to the average distance between elements of C and C end for Add a new vertex C to T and connect it to vertices C i and C j Assigh h(c) = D(C i, C j )/2 Assign length h(c) h(c i ) to edge (C i, C) Assign length h(c) h(c j ) to edge (C j, C) Remove rows and columns of D corresponding to C i and C j Add a row and column to D for the new cluster C end while Gabriel Valiente (ALBCOM) Bioinformatics / 86

21 Ultrametric trees Given an n n ultrametric distance matrix D, the unique tree T with n leaves that fits the data can be reconstructed in O(n 2 ) time using the UPGMA algorithm Example Example A B C D A B C D Gabriel Valiente (ALBCOM) Bioinformatics / 86

22 Ultrametric trees Given an n n distance matrix D, the tree T with n leaves that can be reconstructed using the UPGMA algorithm is not necessarily unique nor does it fit the data unless D is ultrametric Example /6 1 7/6 A B C D 1/8 1/8 1 1 A B C D Gabriel Valiente (ALBCOM) Bioinformatics / 86

23 Additive and non-additive trees A distance matrix D is additive if for every four leaves i, j, k, and l, of the three sums of distances D i,j + D k,l D i,k + D j,l D i,l + D j,k the two largest are equal (four point condition) D i,j + D k,l D i,k + D j,l = D i,l + D j,k D i,k + D j,l D i,j + D k,l = D i,l + D j,k D i,l + D j,k D i,j + D k,l = D i,k + D j,l i k j l It can be determined in O(n 4 ) time whether or not an n n distance matrix D is additive Gabriel Valiente (ALBCOM) Bioinformatics / 86

24 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D N. Saitou, M. Nei. The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution 4(4): , 1987 i m k j Find neighboring leaves i and j, and assign parent k to them Remove rows and columns of i and j Add a row and column for k, with distance to every other leaf m D k,m = D i,m + D j,m D i,j 2 Gabriel Valiente (ALBCOM) Bioinformatics / 86

25 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D Closest leaves (leaves i and j with minimum D i,j ) are not necessarily neighbors i j 11 2 k l Leaves i and j are neighbors, but D i,j = 13 > 12 = D j,k Gabriel Valiente (ALBCOM) Bioinformatics / 86

26 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D Starting with n clusters of one element each, merge the two closest, and far apart from the rest, clusters until only a single cluster remains Define the separation of cluster C from other clusters as u(c) = 1 # 2 C C D(C, C ) Simultaneously minimize D(C i, C j ) and maximize u(c i ) + u(c j ) Minimize D(C i, C j ) u(c i ) u(c j ) Gabriel Valiente (ALBCOM) Bioinformatics / 86

27 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D Form n clusters, each with a single element Construct a graph T with an isolated vertex for each cluster while there is more than one cluster do Find clusters C i and C j minimizing D(C i, C j ) u(c i ) u(c j ) Merge C i and C j into a new cluster C for every cluster C C do Set D(C, C ) to the average of D(C i, C ) and D(C j, C ) end for Add a new vertex C to T and connect it to vertices C i and C j Assign length (D(C i, C j ) + u(c i ) u(c j ))/2 to edge (C i, C) Assign length (D(C i, C j ) + u(c j ) u(c i ))/2 to edge (C j, C) Remove rows and columns of D corresponding to C i and C j Add a row and column to D for the new cluster C end while Gabriel Valiente (ALBCOM) Bioinformatics / 86

28 Additive and non-additive trees Given an n n additive distance matrix D, the unique tree T with n leaves that fits the data can be reconstructed in O(n 5 ) time using the NJ algorithm Example A B C D Example A B C D Gabriel Valiente (ALBCOM) Bioinformatics / 86

29 Additive and non-additive trees Given an n n distance matrix D, the tree T with n leaves that can be reconstructed using the NJ algorithm does not necessarily fit the data unless D is additive Example A B C D Gabriel Valiente (ALBCOM) Bioinformatics / 86

30 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D N. Saitou, M. Nei. The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution 4(4): , 1987 O(n 5 ) J. A. Studier, K. J. Keppler. A Note on the Neighbor-Joining Algorithm of Saitou and Nei. Molecular Biology and Evolution 5(6): , 1988 O(n 3 ) Richard Durbin (Sanger Centre), Sean R. Eddy, Anders Krogh, Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Appendix 7.8: Proof of neighbour-joining theorem T. Mailund, G. S. Brodal, R. Fagerberg, C. N. S. Pedersen, D. Phillips. Recrafting the Neighbor-Joining Method. BMC Bioinformatics 7:29, 2006 O(n 3 ) Gabriel Valiente (ALBCOM) Bioinformatics / 86

31 Additive and non-additive trees J. E. Stajich and twenty others. The BioPerl Toolkit: Perl Modules for the Life Sciences. Genome Research, 12(10): , perl -MCPAN -e " install Bundle :: BioPerl " sudo apt - get install bioperl Gabriel Valiente (ALBCOM) Bioinformatics / 86

32 Additive and non-additive trees The distance matrix can be parsed from a Phylip file my $filename = $ARGV [ 0]; use Bio :: Matrix :: IO; my $parser = new Bio :: Matrix :: IO( -format => phylip, -file => $filename ); my $mat = $parser - > next_matrix ; Gabriel Valiente (ALBCOM) Bioinformatics / 86

33 Additive and non-additive trees The phylogenetic tree can be reconstructed using Neighbor-Joining (NJ) or Unweighted Pair Group Method with Arithmetic Mean (UPGMA) my $method = $ARGV [1]; # UPGMA or NJ use Bio :: Tree :: DistanceFactory ; my $dfactory = Bio :: Tree :: DistanceFactory -> new ( -method => $method ); my $tree = $dfactory -> make_tree ( $mat ); Gabriel Valiente (ALBCOM) Bioinformatics / 86

34 Additive and non-additive trees The phylogenetic tree can be output in Newick format use Bio :: TreeIO ; my $output = new Bio :: TreeIO ( -format => newick ); $output -> write_tree ( $tree ); Gabriel Valiente (ALBCOM) Bioinformatics / 86

35 Additive and non-additive trees The phylogenetic tree can also be set as a rectangular cladogram use Bio :: Tree :: Draw :: Cladogram ; use Bio :: TreeIO ; my $input = $ARGV [ 0]; my $output = $ARGV [ 1]; my $treeio = new Bio :: TreeIO ( -format => newick, -file => $input ); my $tree = $treeio - > next_tree ; my $obj = new Bio :: Tree :: Draw :: Cladogram ( -tree => $tree, - compact => 0); $obj -> print (- file => $output ); Gabriel Valiente (ALBCOM) Bioinformatics / 86

36 Additive and non-additive trees Input distance matrix in Phylip format 7 brown polar black spectacled giant raccoon red Output phylogenetic tree (UPGMA) in Newick format ((((( brown : , polar : ) : , black : ) : , spectacled : ) : , giant : ) : ,( red : , raccoon : ) : ) ; Gabriel Valiente (ALBCOM) Bioinformatics / 86

37 Additive and non-additive trees Input distance matrix in Phylip format 7 brown polar black spectacled giant raccoon red Output phylogenetic tree (NJ) in Newick format ( brown : , polar : ,( black : ,( spectacled : ,( giant : ,( raccoon : , red : ) : ) : ) : ) : ) ; Gabriel Valiente (ALBCOM) Bioinformatics / 86

38 Additive and non-additive trees Brown Bear Polar Bear Black Bear Spectacled Bear Giant Panda Raccoon Red Panda Gabriel Valiente (ALBCOM) Bioinformatics / 86

39 Additive and non-additive trees Output phylogenetic tree (UPGMA) in Newick format ((((( brown : , polar : ) : , black : ) : , spectacled : ) : , giant : ) : ,( red : , raccoon : ) : ) ; brown polar black spectacled giant red raccoon brown polar black spectacled giant red raccoon Gabriel Valiente (ALBCOM) Bioinformatics / 86

40 Additive and non-additive trees Output phylogenetic tree (NJ) in Newick format ( brown : , polar : ,( black : ,( spectacled : ,( giant : ,( raccoon : , red : ) : ) : ) : ) : ) ; brown polar black spectacled giant raccoon red brown polar black spectacled giant raccoon red Gabriel Valiente (ALBCOM) Bioinformatics / 86

41 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Example Lamprey Salmon Shark Lizard Gabriel Valiente (ALBCOM) Bioinformatics / 86

42 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Example paired fins jaws large dermal bones fin rays lungs rasping tongue lamprey shark salmon lizard Gabriel Valiente (ALBCOM) Bioinformatics / 86

43 Perfect phylogenies Let M be an n m genomic matrix Biological interpretation (Cladistics) n taxa m cladistic characters two states, 0 (absent) and 1 (present) unordered Example , Gabriel Valiente (ALBCOM) Bioinformatics / 86

44 Perfect phylogenies Let M be an n m genomic matrix Biological interpretation (Genomics) n sequences m sites, possibly SNP sites two states, 0 and 1 ordered (on the chromosome) Example , Gabriel Valiente (ALBCOM) Bioinformatics / 86

45 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Dan Gusfield. Efficient Algorithms for Inferring Evolutionary Trees. Networks 21(1):19 28, 1991 Theorem Radix sort M by columns in decreasing order Transform M into M by removing repeated columns Let O be the set of entries in M with value 1 for each (i, j) O do set L(i, j) to the largest index k < j such that M (i, k) O set L(i, j) to 0 if there is no such index k end for for each 1 j m do set L(j) to the largest L(i, j) such that (i, j) O end for M has a phylogenetic tree if and only if L(i, j) = L(j) for every (i, j) O Gabriel Valiente (ALBCOM) Bioinformatics / 86

46 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Dan Gusfield. Efficient Algorithms for Inferring Evolutionary Trees. Networks 21(1):19 28, 1991 Create a node n j for each column j of M for each node n j with L(j) > 0 do Make node n L(j) the parent of node n j Label the edge with j and the indexes of all columns identical to j end for Create a root node r for each node n j with L(j) = 0 do Make node r the parent of node n j Label the edge with j and the indexes of all columns identical to j end for Gabriel Valiente (ALBCOM) Bioinformatics / 86

47 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Dan Gusfield. Efficient Algorithms for Inferring Evolutionary Trees. Networks 21(1):19 28, 1991 for each 1 i n do Let c i be the largest index such that M [i, c i ] = 1 Let (n j, n k ) be the edge labeled with c i if node n k is a leaf then Label node n k with i else Create a leaf node n l Make node n k the parent of node n l Label node n l with i end if end for Theorem The resulting tree T is a phylogenetic tree for the genomic matrix M Gabriel Valiente (ALBCOM) Bioinformatics / 86

48 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example D 4 3 B E A C Gabriel Valiente (ALBCOM) Bioinformatics / 86

49 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Test M for a phylogenetic tree) M = L = Π = ( ) L = ( ) M has a phylogenetic tree, because L(i, j) = L(j) for every (i, j) O Gabriel Valiente (ALBCOM) Bioinformatics / 86

50 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Test M for a phylogenetic tree) M = L = Π = ( ) L = ( ) M has no phylogenetic tree, because L(1, 2) L(2), and also because L(4, 3) L(3) Gabriel Valiente (ALBCOM) Bioinformatics / 86

51 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) n 4 n 3 n 2 n 1 L = ( ) n 5 Gabriel Valiente (ALBCOM) Bioinformatics / 86

52 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) L = ( ) n 4 4 n 3 n 2 1 n 1 5 n 5 Gabriel Valiente (ALBCOM) Bioinformatics / 86

53 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) L = ( ) n 4 4 n 3 3 r 2 n 2 1 n 1 5 n 5 Gabriel Valiente (ALBCOM) Bioinformatics / 86

54 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) L = ( ) D 4 n 3 3 r 2 n 2 1 n 1 5 C Gabriel Valiente (ALBCOM) Bioinformatics / 86

55 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) L = ( ) D 4 n 3 3 B r 2 n 2 1 E n 1 A 5 C Gabriel Valiente (ALBCOM) Bioinformatics / 86

56 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) D 4 3 B E L = ( ) A C Gabriel Valiente (ALBCOM) Bioinformatics / 86

57 Perfect phylogenies The genomic matrix can be parsed from a Phylip file my $filename = $ARGV [ 0]; use Bio :: Matrix :: IO; my $parser = new Bio :: Matrix :: IO( -format => phylip, -file => $filename ); my $mat = $parser - > next_matrix ; The phylogenetic tree, if it exists, can be reconstructed using the previous algorithm Exercise Implement a Bio::Tree::PerfectPhylogeny module. Gabriel Valiente (ALBCOM) Bioinformatics / 86

58 Taxonomic reconstruction Compatibility Compatible trees with overlapping taxa can be combined into a single supertree containing the evolutionary information of the given trees. Incompatible trees do not admit their simultaneous inclusion into a common supertree. Two or more phylogenetic trees with nested taxa are ancestrally compatible if they can be refined into a common supertree. Two or more phylogenetic trees with nested taxa are perfectly compatible if there exists a common supertree whose topological restriction to the taxa in each tree is isomorphic to that tree. Philip Daniel, Charles Semple. Supertree Algorithms for Nested Taxa. In: Olaf R. P. Bininda-Emonds (ed.) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Computational Biology, vol. 4, chap. 7, pp Kluwer (2004). Charles Semple, Philip Daniel, Wim Hordijk, Roderic D. M. Page, Mike Steel. Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa. Bioinformatics 20(15), (2004). Gabriel Valiente (ALBCOM) Bioinformatics / 86

59 Taxonomic reconstruction Compatibility Example Two compatible phylogenetic trees obtained from TreeBASE. Loganiaceae Rubiaceae Viburnum Columellia Caprifoliaceae Viburnum Columellia Heptacodium Diervilla Linnaea Gabriel Valiente (ALBCOM) Bioinformatics / 86

60 Taxonomic reconstruction Compatibility Incompatible phylogenetic trees can still be partially combined into a maximum agreement subtree. Mike A. Steel, Tandy Warnow. Kaikoura Tree Theorems: Computing the Maximum Agreement Subtree. Information Processing Letters 48(2), (1993). Compatible phylogenetic trees can be combined into a common supertree. B. R. Baum. Combining Trees as a Way of Combining Datasets for Phylogenetic Inference, and the Desirability of Combining Gene Trees. Taxon 41(1), 3 10 (1992). M. A. Ragan. Phylogenetic Inference based on Matrix Representation of Trees. Molecular Phylogenetics and Evolution 1(1), (1992). Charles Semple, Mike A. Steel. A Supertree Method for Rooted Trees. Discrete Applied Mathematics 105(1 3), (2000). Roderic D. M. Page. Modified Mincut Supertrees. In: Proc. 2nd Int. Workshop Algorithms in Bioinformatics, Lecture Notes in Computer Science, vol. 2452, pp Springer-Verlag (2002). Gabriel Valiente (ALBCOM) Bioinformatics / 86

61 Taxonomic reconstruction Compatibility Definition Let A be a fixed set of labels. A node of a rooted tree with only one child is an elementary node. A semi-labeled tree over A is a rooted tree with some of its nodes, including all its leaves and all its elementary nodes, injectively labeled in the set A. An A-tree is a rooted tree with some of its nodes, including all its leaves, injectively labeled in the set A. The set of the labels of the leaves of an A-tree T is denoted by L(T ), and the set of the labels of all its nodes is denoted by A(T ). Example A semi-labeled tree (left) and an A-tree (right). X A H V C D A H V C D L L Gabriel Valiente (ALBCOM) Bioinformatics / 86

62 Taxonomic reconstruction Compatibility Definition Let T be an A-tree. For every v V (T ), the cluster of v in T is the set A T (v) of the labels of all its descendants, including itself. The cluster representation of T is C A (T ) = {A T (v) v V (T )}. Example The cluster representation of the A-tree A H V C D L is { {C}, {D}, {H}, {L}, {V }, {C, V }, {D, L}, {A, D, H, L}, {A, C, D, H, L, V } }. Gabriel Valiente (ALBCOM) Bioinformatics / 86

63 Taxonomic reconstruction Compatibility Definition The restriction T X of an A-tree T to a set X A of labels is the subtree of T supported on the set of nodes V (T X ) = {v V (T ) A(v) X } and where a node is labeled when it is labeled in T and this label belongs to X, in which case its label in T X is the same as in T. Example An A-tree (left) and its restriction to the set of labels {A, C, D} (right). V A H C D C L A D Gabriel Valiente (ALBCOM) Bioinformatics / 86

64 Taxonomic reconstruction Compatibility Theorem (Llabrés, Rocha, Rosselló, Valiente 2006) Let T 1 and T 2 be two A-trees with A(T 1 ) = A(T 2 ). Then, T 1 and T 2 are ancestrally compatible if and only if C A (T 1 ) and C A (T 2 ) satisfy jointly the following two conditions: For every A A(T 1 ) = A(T 2 ), the smallest member of C A (T 1 ) containing A is equal to the smallest member of C A (T 2 ) containing this label. For every X C A (T 1 ) and Y C A (T 2 ), if X Y, then X Y or Y X. Mercè Llabrés, Jairo Rocha, Francesc Rosselló, Gabriel Valiente. On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa. To appear in J. Math. Biol. (2006). Gabriel Valiente (ALBCOM) Bioinformatics / 86

65 Taxonomic reconstruction Compatibility Example Two incompatible phylogenetic trees obtained from TreeBASE. Convallaria Peliosanthes Geitonoplesium Phormium Herreria Asparagus Ruscus Uvularia Tricyrtis Trillium Alstroemeria Luzuriaga Philesia Dioscoreaceae Smilax Stemonaceae Ripogonum Petermannia Taccaceae Trillium Alstroemeria Tricyrtis Philesia Petermannia Taccaceae Dioscoreaceae Smilax Stemonaceae Ripogonum Uvularia Peliosanthes Convallaria Luzuriaga Geitonoplesium Herreria Phormium Asparagus Ruscus Gabriel Valiente (ALBCOM) Bioinformatics / 86

66 Taxonomic reconstruction Compatibility Example Two incompatible semi-labeled trees obtained from TreeBASE, one of which has a cluster labeled by a taxon in the other tree. Vernonia Asteroideae Blumea Inula Asteroideae Vernonia Inula Gnaphalium Antennaria Gabriel Valiente (ALBCOM) Bioinformatics / 86

67 Taxonomic reconstruction Compatibility Example Two compatible phylogenetic trees obtained from TreeBASE. Loganiaceae Rubiaceae Viburnum Columellia Caprifoliaceae Viburnum Columellia Heptacodium Diervilla Linnaea Example Two incompatible semi-labeled trees obtained from TreeBASE, in which an incompatible triple of labels involves three taxa in one tree and two taxa plus one internal label in the other tree. Loganiaceae Rubiaceae Viburnum Columellia Caprifoliaceae Caprifoliaceae Viburnum Columellia Heptacodium Diervilla Linnaea Gabriel Valiente (ALBCOM) Bioinformatics / 86

68 Taxonomic reconstruction Compatibility Algorithm (Ancestral compatibility) A := A(T 1 ) A(T 2 ); T 1 := T 1 A; T 2 := T 2 A for each label A A do let X 1 be the smallest member of C A ( T 1 ) containing A let X 2 be the smallest member of C A ( T 2 ) containing A if X 1 X 2 then return X 1 and X 2 are incompatible end if end for for each cluster X 1 C A ( T 1 ) do for each cluster X 2 C A ( T 2 ) do if X 1 X 2 and X 1 X 2 and X 2 X 1 then return X 1 and X 2 are incompatible end if end for end for return T 1 and T 2 are compatible Gabriel Valiente (ALBCOM) Bioinformatics / 86

69 Taxonomic reconstruction Compatibility The A-trees can be parsed from a Phylip file, and they can be tested for ancestral compatibility use Bio :: Tree :: Compatible ; use Bio :: TreeIO ; my $filename = $ARGV [ 0]; my $input = new Bio :: TreeIO ( -format => newick, -file => $filename ); my $t1 = $input - > next_tree ; my $t2 = $input - > next_tree ; my ( $incompat, $ilabels, $inodes ) = $t1 -> Bio :: Tree :: Compatible :: is_compatible ( $t2 ); Gabriel Valiente (ALBCOM) Bioinformatics / 86

70 Taxonomic reconstruction Compatibility The cluster representation of the trees is the basis for a certificate of incompatibility if ( $incompat ) { print " the trees are incompatible \n"; my % cluster1 = %{ $t1 -> Bio :: Tree :: Compatible :: cluster_representation }; my % cluster2 = %{ $t2 -> Bio :: Tree :: Compatible :: cluster_representation }; Gabriel Valiente (ALBCOM) Bioinformatics / 86

71 Taxonomic reconstruction Compatibility T 1 and T 2 are incompatible if for some label A A(T 1 ) = A(T 2 ), the smallest member of C A (T 1 ) and of C A (T 2 ) containing A differ. if ( scalar )) { foreach my $label ) { my $n1 = $t1 -> find_node (-id => $label ); my $n2 = $t2 -> find_node (-id => $label ); = $cluster1 { $n1 } }; = $cluster2 { $n2 } }; print " label $label "; print " cluster "; map { print " ",$_ ; print " cluster "; map { print " ",$_ ; print "\n"; } } Gabriel Valiente (ALBCOM) Bioinformatics / 86

72 Taxonomic reconstruction Compatibility T 1 and T 2 are incompatible if for some X C A (T 1 ) and Y C A (T 2 ), clusters X and Y overlap but none is contained in the other. if ( scalar )) { while ) { my $n1 = ; my $n2 = ; = $cluster1 { $n1 } }; = $cluster2 { $n2 } }; print " cluster "; map { print " ",$_ ; print " properly intersects cluster "; map { print " ",$_ ; print "\n"; } } } else { print " the trees are compatible \n"; } Gabriel Valiente (ALBCOM) Bioinformatics / 86

73 Taxonomic reconstruction Compatibility Lemma The ancestral compatibility algorithm takes time quadratic in the size of the trees. Proof. The size of the cluster representation is bounded by the size of the tree, and the bound is tight in the worst case. Exercise Improve the ancestral compatibility algorithm to run in time linear in the size of the trees. Exercise Implement a linear time Bio::Tree::Compatible module. Gabriel Valiente (ALBCOM) Bioinformatics / 86

74 Taxonomic reconstruction Consensus Definition A path (v 0, v 1,..., v k ) in an A-tree T is elementary if, for every i = 1,..., k 1, v i+1 is the only child of v i ; in other words, if all its intermediate nodes have out-degree 1. In particular, an arc forms an elementary path. Example Path 1 5 is not elementary. Gabriel Valiente (ALBCOM) Bioinformatics / 86

75 Taxonomic reconstruction Consensus Definition Two non-trivial paths (a, v 1,..., v k ) and (a, w 1,..., w l ) in an A-tree T are said to diverge if their origin a is their only common node. Example Paths 1 5 and 1 6 do not diverge. Gabriel Valiente (ALBCOM) Bioinformatics / 86

76 Taxonomic reconstruction Consensus Definition An A-tree S is a minor of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: for every a, b V (S), if (a, b) E(S), then there exists a path f (a) f (b) in T with no intermediate node in f (V (S)). In this case, the mapping f is said to be a minor embedding f : S T. Example y S r T 1 x The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 3 and f (y) = 4 is not a minor embedding, because, although it transforms arcs in S into paths in T, the path f (r) f (y) contains the node 3 = f (x), which belongs to f (V (S)). Gabriel Valiente (ALBCOM) Bioinformatics / 86

77 Taxonomic reconstruction Consensus Definition An A-tree S is a minor of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: for every a, b V (S), if (a, b) E(S), then there exists a path f (a) f (b) in T with no intermediate node in f (V (S)). In this case, the mapping f is said to be a minor embedding f : S T. Example y S r T 1 x The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 5 and f (y) = 6 is a minor embedding, because the arcs (r, x), (r, y) E(S) become paths f (r) f (x) and f (r) f (y) in T with no intermediate node in f (V (S)). Gabriel Valiente (ALBCOM) Bioinformatics / 86

78 Taxonomic reconstruction Consensus Definition An A-tree S is a minor of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: for every a, b V (S), if (a, b) E(S), then there exists a path f (a) f (b) in T with no intermediate node in f (V (S)). In this case, the mapping f is said to be a minor embedding f : S T. Example y S r T 1 x The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 5 and f (y) = 6 is not a topological embedding, because these paths f (r) f (x) and f (r) f (y) do not diverge. Gabriel Valiente (ALBCOM) Bioinformatics / 86

79 Taxonomic reconstruction Consensus Definition An A-tree S is a topological subtree of an A-tree T if there exists a minor embedding f : S T such that, for every (a, b), (a, c) E(S) with b c, the paths f (a) f (b) and f (a) f (c) in T diverge. In this case, f is called a topological embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 6 is a topological embedding, because the arcs (r, x), (r, y) E(S) become divergent paths f (r) f (x) and f (r) f (y) in T without intermediate nodes in f (V (S)). Gabriel Valiente (ALBCOM) Bioinformatics / 86

80 Taxonomic reconstruction Consensus Definition An A-tree S is a topological subtree of an A-tree T if there exists a minor embedding f : S T such that, for every (a, b), (a, c) E(S) with b c, the paths f (a) f (b) and f (a) f (c) in T diverge. In this case, f is called a topological embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 6 is not a homeomorphic embedding, because the path f (r) f (y) contains an intermediate node with more than one child. Gabriel Valiente (ALBCOM) Bioinformatics / 86

81 Taxonomic reconstruction Consensus Definition An A-tree S is a homeomorphic subtree of an A-tree T if there exists a minor embedding f : S T satisfying the following extra condition: for every (a, b) E(S), the path f (a) f (b) in T is elementary. In this case, f is said to be a homeomorphic embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 4 is a homeomorphic embedding, because the arcs (r, x), (r, y) E(S) become elementary paths f (r) f (x) and f (r) f (y) in T with no intermediate node in f (V (S)). Gabriel Valiente (ALBCOM) Bioinformatics / 86

82 Taxonomic reconstruction Consensus Definition An A-tree S is a homeomorphic subtree of an A-tree T if there exists a minor embedding f : S T satisfying the following extra condition: for every (a, b) E(S), the path f (a) f (b) in T is elementary. In this case, f is said to be a homeomorphic embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 4 is not an isomorphic embedding, because the path f (r) f (y) is not an arc. Gabriel Valiente (ALBCOM) Bioinformatics / 86

83 Taxonomic reconstruction Consensus Definition An A-tree S is an isomorphic subtree of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: if (a, b) E(S), then (f (a), f (b)) E(T ). Such a mapping f is called an isomorphic embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 3 is an isomorphic embedding, because it transforms every arc in S into an arc in T. Gabriel Valiente (ALBCOM) Bioinformatics / 86

84 Taxonomic reconstruction Consensus Definition An A-tree S is an isomorphic subtree of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: if (a, b) E(S), then (f (a), f (b)) E(T ). Such a mapping f is called an isomorphic embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 4, f (x) = 5 and f (y) = 6 is an isomorphic embedding, because it transforms every arc in S into an arc in T. Gabriel Valiente (ALBCOM) Bioinformatics / 86

85 Taxonomic reconstruction Consensus Remark Minor, topological, homeomorphic, and isomorphic embeddings of A-trees are injective mappings satisfying the additional condition that node labels are preserved and reflected. Example T 1 S T 2 A B C D E F A B D E A B F D E C There are isomorphic embeddings S T 1 and S T 2. Gabriel Valiente (ALBCOM) Bioinformatics / 86

86 Taxonomic reconstruction Consensus Remark Minor, topological, homeomorphic, and isomorphic embeddings of A-trees are injective mappings satisfying the additional condition that node labels are preserved and reflected. Example T 1 S T 2 A B C D E F A B D E A B F D E C There are topological embeddings S T 1 and S T 2. Gabriel Valiente (ALBCOM) Bioinformatics / 86

87 Taxonomic reconstruction Consensus Lemma The size M(T 1, T 2 ) of a largest common topological subtree of two A-trees T 1 and T 2 can be computed in O(n 4.5 log n) time. Proof. Mike Steel, Tandy Warnow. Kaikoura Tree Theorems: Computing the Maximum Agreement Subtree. Inform. Process. Lett. 48(2), (1993). Gabriel Valiente (ALBCOM) Bioinformatics / 86

88 Taxonomic reconstruction Consensus Remark The size M(T 1, T 2 ) of a largest common topological subtree of two binary A-trees T 1 and T 2 follows a simple recurrence. T 1 v w T 2 a b c d M(T 1 [v], T 2 [w]) is the size of L(T 1 [v]) L(T 2 [w]) if T 1 [v] or T 2 [w] is a singleton, otherwise M(T 1 [v], T 2 [w]) = max M(T 1 [a], T 2 [c]) + M(T 1 [b], T 2 [d]) M(T 1 [a], T 2 [d]) + M(T 1 [b], T 2 [c]) M(T 1 [a], T 2 [w]) M(T 1 [b], T 2 [w]) M(T 1 [v], T 2 [c]) M(T 1 [v], T 2 [d]) Gabriel Valiente (ALBCOM) Bioinformatics / 86

89 Taxonomic reconstruction Consensus Lemma The size M(T 1, T 2 ) of a largest common topological subtree of two A-trees T 1 and T 2 can be computed in O(n 2 ) time. Proof. Martin Farach, Mikkel Thorup. Fast Comparison of Evolutionary Trees. Inform. Comput. 123(1), (1995). Exercise Implement an efficient Bio::Tree::Agreement module. Gabriel Valiente (ALBCOM) Bioinformatics / 86

90 Taxonomic reconstruction Combination Remark Let x denote isomorphic, homeomorphic, topological, or minor. Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. Francesc Rosselló, Gabriel Valiente. An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees. To appear in Theoret. Comput. Sci. (2006). Gabriel Valiente (ALBCOM) Bioinformatics / 86

91 Taxonomic reconstruction Combination Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. T σ T T po T 1 T 2 T 1 T 2 T p T µ Gabriel Valiente (ALBCOM) Bioinformatics / 86

92 Taxonomic reconstruction Combination Definition The intersection of two trees T 1 and T 2 obtained through minor embeddings f 1 : T 1 T and f 2 : T 2 T into a tree T, is the graph T p with set of nodes V (T p ) = V (T 1 ) V (T 2 ) and set of arcs defined in the following way: for every a, b V (T 1 ) V (T 2 ), (a, b) E(T p ) if and only if there are paths a b in T 1 and in T 2 without intermediate nodes in V (T 1 ) V (T 2 ). Example Let T be a tree with nodes a 1, a 2, b, c and arcs (a 1, a 2 ), (a 2, b), (a 2, c), let T 1 be its minor with nodes a 1, b, c and arcs (a 1, b), (a 1, c), and let T 2 be its minor with nodes a 2, b, c and arcs (a 2, b), (a 2, c). In this case T p is the graph with nodes b, c and no arc, and in particular it is not a tree. Gabriel Valiente (ALBCOM) Bioinformatics / 86

93 Taxonomic reconstruction Combination Definition The intersection of two trees T 1 and T 2 obtained through minor embeddings f 1 : T 1 T and f 2 : T 2 T into a tree T, is the graph T p with set of nodes V (T p ) = V (T 1 ) V (T 2 ) and set of arcs defined in the following way: for every a, b V (T 1 ) V (T 2 ), (a, b) E(T p ) if and only if there are paths a b in T 1 and in T 2 without intermediate nodes in V (T 1 ) V (T 2 ). Example a 1 T T 1 a 1 a 2 T 2 T p a 2 b c b c b c b c Gabriel Valiente (ALBCOM) Bioinformatics / 86

94 Taxonomic reconstruction Combination Theorem For every two trees T 1 and T 2, any intersection of T 1 and T 2 obtained through x-embeddings into a smallest common x-supertree of them is a largest common x-subtree of T 1 and T 2. Corollary Every largest common x-supertree of a pair of trees T 1 and T 2 is, up to an isomorphism, the intersection of T 1 and T 2 obtained through their embeddings into a smallest common x-supertree. Gabriel Valiente (ALBCOM) Bioinformatics / 86

95 Taxonomic reconstruction Combination Definition The x-join of two trees T 1 and T 2 obtained through x-embeddings m 1 : T µ T 1 and m 2 : T µ T 2 of a largest common x-subtree T µ of them, is the quotient graph T po of the disjoint sum T 1 + T 2 by the equivalence relation θ defined, up to symmetry, by the following condition: (a, b) θ if and only if a = b or there exists some c V (T µ ) such that a = m 1 (c) and b = m 2 (c). Example Let T µ be the graph with nodes b, c and no arc, let T 1 be the tree with nodes a 1, b, c and arcs (a 1, b), (a 1, c), and let T 2 be the tree with nodes a 2, b, c and arcs (a 2, b), (a 2, c). The x-join T po of T 1 and T 2 through the obvious embeddings of T µ is the graph with nodes a 1, a 2, b, c and arcs (a 1, b), (a 1, c), (a 2, b), (a 2, c), but it is not a smallest common x-supertree of T 1 and T 2, because T µ is not a largest common x-subtree of T 1 and T 2 (it is not even a tree). Gabriel Valiente (ALBCOM) Bioinformatics / 86

96 Taxonomic reconstruction Combination Definition The x-join of two trees T 1 and T 2 obtained through x-embeddings m 1 : T µ T 1 and m 2 : T µ T 2 of a largest common x-subtree T µ of them, is the quotient graph T po of the disjoint sum T 1 + T 2 by the equivalence relation θ defined, up to symmetry, by the following condition: (a, b) θ if and only if a = b or there exists some c V (T µ ) such that a = m 1 (c) and b = m 2 (c). Example T µ T 1 a 1 a 2 T po b c b c T 2 a 1 b c a 2 b c Gabriel Valiente (ALBCOM) Bioinformatics / 86

97 Taxonomic reconstruction Combination Definition The x-sum of two trees T 1 and T 2 obtained through x-embeddings m 1 : T µ T 1 and m 2 : T µ T 2 of a largest common x-subtree T µ of them, is the graph T σ obtained from the x-join T po of T 1 and T 2 by removing every arc that is subsumed by a path: that is, we remove from T po each arc (v, w) for which there is another path v w in T po. Example T 2 T po T σ T µ T 1 a a a a a b c b c b c d e d e d e d e d e Gabriel Valiente (ALBCOM) Bioinformatics / 86

98 Taxonomic reconstruction Combination Theorem For every pair of trees T 1 and T 2, any x-sum of T 1 and T 2 is a smallest common x-supertree of them. Corollary Every smallest common x-supertree of a pair of trees T 1 and T 2 is, up to an isomorphism, the x-sum of T 1 and T 2 obtained through the embeddings of a largest common x-subtree into them. Gabriel Valiente (ALBCOM) Bioinformatics / 86

99 Taxonomic reconstruction Combination Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. Given a largest common x-subtree T µ of two trees T 1 and T 2 and a pair of witness x-embeddings m 1 : T µ T 1 and m 2 : T µ T 2, a smallest common x-supertree T σ of T 1 and T 2 can be obtained in time linear in the size of T 1 and T 2, as follows. Gabriel Valiente (ALBCOM) Bioinformatics / 86

100 Taxonomic reconstruction Combination Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. First, make copies T 1 and T 2 of T 1 and T 2, with l 1 : T 1 T 1 and l 2 : T 2 T 2 identity mappings. Second, sum up T 1 and T 2 into a graph T σ. Third, for each a V (T µ ), merge nodes l 1 (m 1 (a)) and l 2 (m 2 (a)), and remove all parallel arcs. Gabriel Valiente (ALBCOM) Bioinformatics / 86

101 Taxonomic reconstruction Combination Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. Next, remove from T σ all arcs subsumed by paths, as follows. For each node y V (T σ ) of in-degree 2, let x, x V (T σ ) be the source nodes of the two arcs coming into y. Now, perform a simultaneous traversal of the paths of arcs coming into x and x, until reaching node x along the first path or x along the second path. The simultaneous traversal of incoming paths may stop along either path, but continue along the other one, because a node of in-degree 0 or in-degree 2 is reached. Finally, remove from T σ either arc (x, y), if node x was reached along the first path, or arc (x, y), if node x was reached along the second path. Gabriel Valiente (ALBCOM) Bioinformatics / 86

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute

More information

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 5 CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC

More information

Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel

Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel SUPERTREE ALGORITHMS FOR ANCESTRAL DIVERGENCE DATES AND NESTED TAXA Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel Department of Mathematics and Statistics University of Canterbury

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa

Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa Charles Semple 1, Philip Daniel 1, Wim Hordijk 1, Roderic D. M. Page 2, and Mike Steel 1 1 Biomathematics Research Centre, Department

More information

Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

The Generalized Neighbor Joining method

The Generalized Neighbor Joining method The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

Phylogenetic Networks with Recombination

Phylogenetic Networks with Recombination Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange

More information

Properties of normal phylogenetic networks

Properties of normal phylogenetic networks Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is

More information

Theory of Evolution. Charles Darwin

Theory of Evolution. Charles Darwin Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment. Sequences Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

More information

arxiv: v1 [cs.cc] 9 Oct 2014

arxiv: v1 [cs.cc] 9 Oct 2014 Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz May 7, 08 arxiv:40.7v [cs.cc] 9 Oct 04 Abstract A ternary

More information

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS KT Huber, V Moulton, C Semple, and M Steel Department of Mathematics and Statistics University of Canterbury Private Bag 4800 Christchurch,

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Evolutionary trees. Describe the relationship between objects, e.g. species or genes Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies The evolutionary relationships between

More information

arxiv: v1 [cs.ds] 21 May 2013

arxiv: v1 [cs.ds] 21 May 2013 Easy identification of generalized common nested intervals Fabien de Montgolfier 1, Mathieu Raffinot 1, and Irena Rusu 2 arxiv:1305.4747v1 [cs.ds] 21 May 2013 1 LIAFA, Univ. Paris Diderot - Paris 7, 75205

More information

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS PETER J. HUMPHRIES AND CHARLES SEMPLE Abstract. For two rooted phylogenetic trees T and T, the rooted subtree prune and regraft distance

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Parsimony via Consensus

Parsimony via Consensus Syst. Biol. 57(2):251 256, 2008 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150802040597 Parsimony via Consensus TREVOR C. BRUEN 1 AND DAVID

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances

Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances Ilan Gronau Shlomo Moran September 6, 2006 Abstract Reconstructing phylogenetic trees efficiently and accurately from distance estimates

More information

Phylogeny Tree Algorithms

Phylogeny Tree Algorithms Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Some slides by Serafim Batzoglou 1 From expression profiles to distances From the Raw Data matrix we compute the similarity matrix S. S ij reflects the similarity of the expression

More information

arxiv: v1 [cs.ds] 1 Nov 2018

arxiv: v1 [cs.ds] 1 Nov 2018 An O(nlogn) time Algorithm for computing the Path-length Distance between Trees arxiv:1811.00619v1 [cs.ds] 1 Nov 2018 David Bryant Celine Scornavacca November 5, 2018 Abstract Tree comparison metrics have

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

TheDisk-Covering MethodforTree Reconstruction

TheDisk-Covering MethodforTree Reconstruction TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document

More information

Restricted trees: simplifying networks with bottlenecks

Restricted trees: simplifying networks with bottlenecks Restricted trees: simplifying networks with bottlenecks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu February 17, 2011 Abstract. Suppose N

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

arxiv: v1 [q-bio.pe] 1 Jun 2014

arxiv: v1 [q-bio.pe] 1 Jun 2014 THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES MAGNUS BORDEWICH 1, CATHERINE MCCARTIN 2, AND CHARLES SEMPLE 3 Abstract. In this paper, we give a (polynomial-time) 3-approximation

More information

Haplotyping as Perfect Phylogeny: A direct approach

Haplotyping as Perfect Phylogeny: A direct approach Haplotyping as Perfect Phylogeny: A direct approach Vineet Bafna Dan Gusfield Giuseppe Lancia Shibu Yooseph February 7, 2003 Abstract A full Haplotype Map of the human genome will prove extremely valuable

More information

STRUCTURAL BIOINFORMATICS I. Fall 2015

STRUCTURAL BIOINFORMATICS I. Fall 2015 STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C;

More information

1.1 The (rooted, binary-character) Perfect-Phylogeny Problem

1.1 The (rooted, binary-character) Perfect-Phylogeny Problem Contents 1 Trees First 3 1.1 Rooted Perfect-Phylogeny...................... 3 1.1.1 Alternative Definitions.................... 5 1.1.2 The Perfect-Phylogeny Problem and Solution....... 7 1.2 Alternate,

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

A phylogenomic toolbox for assembling the tree of life

A phylogenomic toolbox for assembling the tree of life A phylogenomic toolbox for assembling the tree of life or, The Phylota Project (http://www.phylota.org) UC Davis Mike Sanderson Amy Driskell U Pennsylvania Junhyong Kim Iowa State Oliver Eulenstein David

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E. Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 14 March 11 Lecturer: Lior Pachter Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar 14.1 Introduction The goal of today s lecture

More information

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Evolutionary trees. Describe the relationship between objects, e.g. species or genes Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies Anatomical features were the dominant

More information

The Complexity of Constructing Evolutionary Trees Using Experiments

The Complexity of Constructing Evolutionary Trees Using Experiments The Complexity of Constructing Evolutionary Trees Using Experiments Gerth Stlting Brodal 1,, Rolf Fagerberg 1,, Christian N. S. Pedersen 1,, and Anna Östlin2, 1 BRICS, Department of Computer Science, University

More information

Integer Programming for Phylogenetic Network Problems

Integer Programming for Phylogenetic Network Problems Integer Programming for Phylogenetic Network Problems D. Gusfield University of California, Davis Presented at the National University of Singapore, July 27, 2015.! There are many important phylogeny problems

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

Reconstructing Trees from Subtree Weights

Reconstructing Trees from Subtree Weights Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Regular networks are determined by their trees

Regular networks are determined by their trees Regular networks are determined by their trees Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu February 17, 2009 Abstract. A rooted acyclic digraph

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

Building Phylogenetic Trees UPGMA & NJ

Building Phylogenetic Trees UPGMA & NJ uilding Phylogenetic Trees UPGM & NJ UPGM UPGM Unweighted Pair-Group Method with rithmetic mean Unweighted = all pairwise distances contribute equally. Pair-Group = groups are combined in pairs. rithmetic

More information

Even Cycles in Hypergraphs.

Even Cycles in Hypergraphs. Even Cycles in Hypergraphs. Alexandr Kostochka Jacques Verstraëte Abstract A cycle in a hypergraph A is an alternating cyclic sequence A 0, v 0, A 1, v 1,..., A k 1, v k 1, A 0 of distinct edges A i and

More information

Reconstruction of certain phylogenetic networks from their tree-average distances

Reconstruction of certain phylogenetic networks from their tree-average distances Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,

More information

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.!

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.! Integer Programming in Computational Biology D. Gusfield University of California, Davis Presented December 12, 2016. There are many important phylogeny problems that depart from simple tree models: Missing

More information

Perfect Phylogenetic Networks with Recombination Λ

Perfect Phylogenetic Networks with Recombination Λ Perfect Phylogenetic Networks with Recombination Λ Lusheng Wang Dept. of Computer Sci. City Univ. of Hong Kong 83 Tat Chee Avenue Hong Kong lwang@cs.cityu.edu.hk Kaizhong Zhang Dept. of Computer Sci. Univ.

More information

Jed Chou. April 13, 2015

Jed Chou. April 13, 2015 of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene

More information

Phylogeny. November 7, 2017

Phylogeny. November 7, 2017 Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related

More information

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas. Solving the Maximum Agreement Subtree and Maximum Compatible Tree problems on bounded degree trees LIRMM, Montpellier France 4th July 2006 Introduction The Mast and Mct problems: given a set of evolutionary

More information

Supplementary Information

Supplementary Information Supplementary Information For the article"comparable system-level organization of Archaea and ukaryotes" by J. Podani, Z. N. Oltvai, H. Jeong, B. Tombor, A.-L. Barabási, and. Szathmáry (reference numbers

More information

Chapter 19: Taxonomy, Systematics, and Phylogeny

Chapter 19: Taxonomy, Systematics, and Phylogeny Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand

More information

Gel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1

Gel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1 Gel Electrophoresis Used to measure the lengths of DNA fragments. When voltage is applied to DNA, different size fragments migrate to different distances (smaller ones travel farther). 10/28/0310/21/2003

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

molecular evolution and phylogenetics

molecular evolution and phylogenetics molecular evolution and phylogenetics Charlotte Darby Computational Genomics: Applied Comparative Genomics 2.13.18 https://www.thinglink.com/scene/762084640000311296 Internal node Root TIME Branch Leaves

More information

Algebraic Statistics Tutorial I

Algebraic Statistics Tutorial I Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =

More information

5 Quiver Representations

5 Quiver Representations 5 Quiver Representations 5. Problems Problem 5.. Field embeddings. Recall that k(y,..., y m ) denotes the field of rational functions of y,..., y m over a field k. Let f : k[x,..., x n ] k(y,..., y m )

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models

The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models Gabriel Cardona, Arnau Mir, Francesc Rosselló Department of Mathematics and Computer Science, University

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Comparative Genomics II

Comparative Genomics II Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information