BIOINFORMATICS GABRIEL VALIENTE ALGORITHMS, BIOINFORMATICS, COMPLEXITY AND FORMAL METHODS RESEARCH GROUP, TECHNICAL UNIVERSITY OF CATALONIA
|
|
- Chastity Gilbert
- 5 years ago
- Views:
Transcription
1 BIOINFORMATICS GABRIEL VALIENTE ALGORITHMS, BIOINFORMATICS, COMPLEXITY AND FORMAL METHODS RESEARCH GROUP, TECHNICAL UNIVERSITY OF CATALONIA Gabriel Valiente (ALBCOM) Bioinformatics / 86
2 Introduction April 27 Ultrametric trees Phylogenetic reconstruction May 4 Additive and non-additive trees May 11 May 18 Perfect phylogenies Compatibility Taxonomic reconstruction May 25 Consensus June 1 Combination Gabriel Valiente (ALBCOM) Bioinformatics / 86
3 Introduction Michael S. Waterman (University of Southern California). Introduction to Computational Biology. Chapman & Hall, Dan Gusfield (University of California, Davis). Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, Roderic D. M. Page (University of Glasgow) and Edward C. Holmes (University of Oxford). Molecular Evolution: A Phylogenetic Approach. Blackwell Science, Gabriel Valiente (Technical University of Catalonia). Algorithms on Trees and Graphs. Springer-Verlag, Neil C. Jones, Pavel A. Pevzner (University of California, San Diego). An Introduction to Bioinformatics Algorithms. The MIT Press, Arthur M. Lesk (Pennsylvania State University). Introduction to Bioinformatics. 2nd Edition. Oxford University Press Gabriel Valiente (ALBCOM) Bioinformatics / 86
4 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86
5 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86
6 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86
7 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86
8 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged Gabriel Valiente (ALBCOM) Bioinformatics / 86
9 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged MYA Brown Bear Polar Bear Black Bear Spectacled Bear Giant Panda Raccoon Red Panda No correlation between evolutionary distances and edge lengths Gabriel Valiente (ALBCOM) Bioinformatics / 86
10 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged MYA Brown Bear Polar Bear Black Bear Spectacled Bear Giant Panda Raccoon Red Panda No correlation between evolutionary distances and edge lengths Gabriel Valiente (ALBCOM) Bioinformatics / 86
11 Ultrametric trees The (evolutionary) distance D i,j between two species i and j measures the length of time since the species diverged 40 MYA Brown Bear Polar Bear Black Bear Spectacled Bear Giant Panda Raccoon Red Panda Gabriel Valiente (ALBCOM) Bioinformatics / 86
12 Ultrametric trees Given a weighted tree T with n leaves, compute the length d T i,j of the path between any two leaves i and j The length of the path between any two nodes can be calculated as the sum of the weights of the edges in the path between them For example, d1,5 T = = 68 Gabriel Valiente (ALBCOM) Bioinformatics / 86
13 Ultrametric trees Given an n n distance matrix D, find a tree T with n leaves that fits the data, that is, such that di,j T = D i,j for every two leaves i and j A matrix D if symmetric non-negative if D i,j = D j,i and D i,j 0 for all i and j A matrix D satisfies the triangle inequality if D i,j + D j,k D i,k for all i, j, and k A matrix D is a distance matrix if it is symmetric non-negative, it satisfies the triangle inequality, and D i,j 0 for all i j Gabriel Valiente (ALBCOM) Bioinformatics / 86
14 Ultrametric trees There are many ways in which distance matrices can be generated Sequence a particular gene in n species and define D i,j as the edit distance between this gene in species i and species j Sequence a particular gene in n species and define D i,j as the alignment distance between this gene in species i and species j Gabriel Valiente (ALBCOM) Bioinformatics / 86
15 Ultrametric trees Given an n n distance matrix D, find a tree T with n leaves that fits the data, that is, such that d T i,j = D i,j for every two leaves i and j There is only one unrooted binary tree topology T with n = 3 leaves i d T i,c c d T k,c The lengths of each edge in T are defined by three equations with three variables d T i,c + d T j,c = D i,j d T i,c + d T k,c = D i,k d T j,c + d T k,c = D j,k k d T j,c j d T i,c = (D i,j + D i,k D j,k )/2 d T j,c = (D i,j + D j,k D i,k )/2 d T k,c = (D i,k + D j,k D i,j )/2 Gabriel Valiente (ALBCOM) Bioinformatics / 86
16 Ultrametric trees Given an n n distance matrix D, find a tree T with n leaves that fits the data, that is, such that d T i,j = D i,j for every two leaves i and j An unrooted binary tree with n leaves has 2n 3 edges Fitting any given tree T with n leaves to an n n distance matrix D involves solving a system of ( n 2) equations with 2n 3 variables For n = 4, this amounts to solving a system of six equations with only five variables, and it is not always possible to solve this system, making it hard or impossible to construct such a tree T from D Gabriel Valiente (ALBCOM) Bioinformatics / 86
17 Ultrametric trees A distance matrix D is ultrametric if for every three leaves i, j, and k, of the three distances D i,j D i,k D j,k the two largest are equal (three point condition) D i,j D i,k = D j,k D i,k D i,j = D j,k D j,k D i,j = D i,k i j a a b k D i,j = 2a a + b = D i,k = D j,k implies a b It can be determined in O(n 3 ) time whether or not an n n distance matrix D is ultrametric Gabriel Valiente (ALBCOM) Bioinformatics / 86
18 Ultrametric trees Ultrametric distance matrices model evolutionary trees An evolutionary tree is a rooted binary tree with internal nodes labeled by a number and with strictly decreasing labels along any root-to-leaf path For every two leaves i and j, their distance D i,j is the label of the least common ancestor of species i and j A B C D E F G Gabriel Valiente (ALBCOM) Bioinformatics / 86
19 Ultrametric trees Unweighted Pair Group Method with Arithmetic Mean is an algorithm for reconstructing a tree T from an ultrametric distance matrix D P. H. A. Sneath and R. R. Sokal. Numerical Taxonomy: The Principles and Practice of Numerical Classification. W. H. Freeman, San Francisco, 1973 Starting with n clusters of one element each, merge the two closest clusters until only a single cluster remains The distance between two disjoint clusters C i and C j is defined as the average inter-cluster pairwise distance, D(C i, C j ) = 1 C i C j D i,j i C i j C j The length of an edge (u, v) is defined as the difference in heights of the vertices u and v The height plays the role of the molecular clock, and allows one to date the divergence point for every vertex in the evolutionary tree Gabriel Valiente (ALBCOM) Bioinformatics / 86
20 Ultrametric trees Unweighted Pair Group Method with Arithmetic Mean is an algorithm for reconstructing a tree T from an ultrametric distance matrix D Form n clusters, each with a single element Construct a graph T with a vertex v of height h(v) = 0 for each cluster while there is more than one cluster do Find the two closest clusters C i and C j Merge C i and C j into a new cluster C for every cluster C C do Set D(C, C ) to the average distance between elements of C and C end for Add a new vertex C to T and connect it to vertices C i and C j Assigh h(c) = D(C i, C j )/2 Assign length h(c) h(c i ) to edge (C i, C) Assign length h(c) h(c j ) to edge (C j, C) Remove rows and columns of D corresponding to C i and C j Add a row and column to D for the new cluster C end while Gabriel Valiente (ALBCOM) Bioinformatics / 86
21 Ultrametric trees Given an n n ultrametric distance matrix D, the unique tree T with n leaves that fits the data can be reconstructed in O(n 2 ) time using the UPGMA algorithm Example Example A B C D A B C D Gabriel Valiente (ALBCOM) Bioinformatics / 86
22 Ultrametric trees Given an n n distance matrix D, the tree T with n leaves that can be reconstructed using the UPGMA algorithm is not necessarily unique nor does it fit the data unless D is ultrametric Example /6 1 7/6 A B C D 1/8 1/8 1 1 A B C D Gabriel Valiente (ALBCOM) Bioinformatics / 86
23 Additive and non-additive trees A distance matrix D is additive if for every four leaves i, j, k, and l, of the three sums of distances D i,j + D k,l D i,k + D j,l D i,l + D j,k the two largest are equal (four point condition) D i,j + D k,l D i,k + D j,l = D i,l + D j,k D i,k + D j,l D i,j + D k,l = D i,l + D j,k D i,l + D j,k D i,j + D k,l = D i,k + D j,l i k j l It can be determined in O(n 4 ) time whether or not an n n distance matrix D is additive Gabriel Valiente (ALBCOM) Bioinformatics / 86
24 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D N. Saitou, M. Nei. The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution 4(4): , 1987 i m k j Find neighboring leaves i and j, and assign parent k to them Remove rows and columns of i and j Add a row and column for k, with distance to every other leaf m D k,m = D i,m + D j,m D i,j 2 Gabriel Valiente (ALBCOM) Bioinformatics / 86
25 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D Closest leaves (leaves i and j with minimum D i,j ) are not necessarily neighbors i j 11 2 k l Leaves i and j are neighbors, but D i,j = 13 > 12 = D j,k Gabriel Valiente (ALBCOM) Bioinformatics / 86
26 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D Starting with n clusters of one element each, merge the two closest, and far apart from the rest, clusters until only a single cluster remains Define the separation of cluster C from other clusters as u(c) = 1 # 2 C C D(C, C ) Simultaneously minimize D(C i, C j ) and maximize u(c i ) + u(c j ) Minimize D(C i, C j ) u(c i ) u(c j ) Gabriel Valiente (ALBCOM) Bioinformatics / 86
27 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D Form n clusters, each with a single element Construct a graph T with an isolated vertex for each cluster while there is more than one cluster do Find clusters C i and C j minimizing D(C i, C j ) u(c i ) u(c j ) Merge C i and C j into a new cluster C for every cluster C C do Set D(C, C ) to the average of D(C i, C ) and D(C j, C ) end for Add a new vertex C to T and connect it to vertices C i and C j Assign length (D(C i, C j ) + u(c i ) u(c j ))/2 to edge (C i, C) Assign length (D(C i, C j ) + u(c j ) u(c i ))/2 to edge (C j, C) Remove rows and columns of D corresponding to C i and C j Add a row and column to D for the new cluster C end while Gabriel Valiente (ALBCOM) Bioinformatics / 86
28 Additive and non-additive trees Given an n n additive distance matrix D, the unique tree T with n leaves that fits the data can be reconstructed in O(n 5 ) time using the NJ algorithm Example A B C D Example A B C D Gabriel Valiente (ALBCOM) Bioinformatics / 86
29 Additive and non-additive trees Given an n n distance matrix D, the tree T with n leaves that can be reconstructed using the NJ algorithm does not necessarily fit the data unless D is additive Example A B C D Gabriel Valiente (ALBCOM) Bioinformatics / 86
30 Additive and non-additive trees Neighbor Joining is an algorithm for reconstructing a tree T from an additive distance matrix D N. Saitou, M. Nei. The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution 4(4): , 1987 O(n 5 ) J. A. Studier, K. J. Keppler. A Note on the Neighbor-Joining Algorithm of Saitou and Nei. Molecular Biology and Evolution 5(6): , 1988 O(n 3 ) Richard Durbin (Sanger Centre), Sean R. Eddy, Anders Krogh, Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Appendix 7.8: Proof of neighbour-joining theorem T. Mailund, G. S. Brodal, R. Fagerberg, C. N. S. Pedersen, D. Phillips. Recrafting the Neighbor-Joining Method. BMC Bioinformatics 7:29, 2006 O(n 3 ) Gabriel Valiente (ALBCOM) Bioinformatics / 86
31 Additive and non-additive trees J. E. Stajich and twenty others. The BioPerl Toolkit: Perl Modules for the Life Sciences. Genome Research, 12(10): , perl -MCPAN -e " install Bundle :: BioPerl " sudo apt - get install bioperl Gabriel Valiente (ALBCOM) Bioinformatics / 86
32 Additive and non-additive trees The distance matrix can be parsed from a Phylip file my $filename = $ARGV [ 0]; use Bio :: Matrix :: IO; my $parser = new Bio :: Matrix :: IO( -format => phylip, -file => $filename ); my $mat = $parser - > next_matrix ; Gabriel Valiente (ALBCOM) Bioinformatics / 86
33 Additive and non-additive trees The phylogenetic tree can be reconstructed using Neighbor-Joining (NJ) or Unweighted Pair Group Method with Arithmetic Mean (UPGMA) my $method = $ARGV [1]; # UPGMA or NJ use Bio :: Tree :: DistanceFactory ; my $dfactory = Bio :: Tree :: DistanceFactory -> new ( -method => $method ); my $tree = $dfactory -> make_tree ( $mat ); Gabriel Valiente (ALBCOM) Bioinformatics / 86
34 Additive and non-additive trees The phylogenetic tree can be output in Newick format use Bio :: TreeIO ; my $output = new Bio :: TreeIO ( -format => newick ); $output -> write_tree ( $tree ); Gabriel Valiente (ALBCOM) Bioinformatics / 86
35 Additive and non-additive trees The phylogenetic tree can also be set as a rectangular cladogram use Bio :: Tree :: Draw :: Cladogram ; use Bio :: TreeIO ; my $input = $ARGV [ 0]; my $output = $ARGV [ 1]; my $treeio = new Bio :: TreeIO ( -format => newick, -file => $input ); my $tree = $treeio - > next_tree ; my $obj = new Bio :: Tree :: Draw :: Cladogram ( -tree => $tree, - compact => 0); $obj -> print (- file => $output ); Gabriel Valiente (ALBCOM) Bioinformatics / 86
36 Additive and non-additive trees Input distance matrix in Phylip format 7 brown polar black spectacled giant raccoon red Output phylogenetic tree (UPGMA) in Newick format ((((( brown : , polar : ) : , black : ) : , spectacled : ) : , giant : ) : ,( red : , raccoon : ) : ) ; Gabriel Valiente (ALBCOM) Bioinformatics / 86
37 Additive and non-additive trees Input distance matrix in Phylip format 7 brown polar black spectacled giant raccoon red Output phylogenetic tree (NJ) in Newick format ( brown : , polar : ,( black : ,( spectacled : ,( giant : ,( raccoon : , red : ) : ) : ) : ) : ) ; Gabriel Valiente (ALBCOM) Bioinformatics / 86
38 Additive and non-additive trees Brown Bear Polar Bear Black Bear Spectacled Bear Giant Panda Raccoon Red Panda Gabriel Valiente (ALBCOM) Bioinformatics / 86
39 Additive and non-additive trees Output phylogenetic tree (UPGMA) in Newick format ((((( brown : , polar : ) : , black : ) : , spectacled : ) : , giant : ) : ,( red : , raccoon : ) : ) ; brown polar black spectacled giant red raccoon brown polar black spectacled giant red raccoon Gabriel Valiente (ALBCOM) Bioinformatics / 86
40 Additive and non-additive trees Output phylogenetic tree (NJ) in Newick format ( brown : , polar : ,( black : ,( spectacled : ,( giant : ,( raccoon : , red : ) : ) : ) : ) : ) ; brown polar black spectacled giant raccoon red brown polar black spectacled giant raccoon red Gabriel Valiente (ALBCOM) Bioinformatics / 86
41 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Example Lamprey Salmon Shark Lizard Gabriel Valiente (ALBCOM) Bioinformatics / 86
42 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Example paired fins jaws large dermal bones fin rays lungs rasping tongue lamprey shark salmon lizard Gabriel Valiente (ALBCOM) Bioinformatics / 86
43 Perfect phylogenies Let M be an n m genomic matrix Biological interpretation (Cladistics) n taxa m cladistic characters two states, 0 (absent) and 1 (present) unordered Example , Gabriel Valiente (ALBCOM) Bioinformatics / 86
44 Perfect phylogenies Let M be an n m genomic matrix Biological interpretation (Genomics) n sequences m sites, possibly SNP sites two states, 0 and 1 ordered (on the chromosome) Example , Gabriel Valiente (ALBCOM) Bioinformatics / 86
45 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Dan Gusfield. Efficient Algorithms for Inferring Evolutionary Trees. Networks 21(1):19 28, 1991 Theorem Radix sort M by columns in decreasing order Transform M into M by removing repeated columns Let O be the set of entries in M with value 1 for each (i, j) O do set L(i, j) to the largest index k < j such that M (i, k) O set L(i, j) to 0 if there is no such index k end for for each 1 j m do set L(j) to the largest L(i, j) such that (i, j) O end for M has a phylogenetic tree if and only if L(i, j) = L(j) for every (i, j) O Gabriel Valiente (ALBCOM) Bioinformatics / 86
46 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Dan Gusfield. Efficient Algorithms for Inferring Evolutionary Trees. Networks 21(1):19 28, 1991 Create a node n j for each column j of M for each node n j with L(j) > 0 do Make node n L(j) the parent of node n j Label the edge with j and the indexes of all columns identical to j end for Create a root node r for each node n j with L(j) = 0 do Make node r the parent of node n j Label the edge with j and the indexes of all columns identical to j end for Gabriel Valiente (ALBCOM) Bioinformatics / 86
47 Perfect phylogenies Given an n m genomic matrix M, find a tree T with n leaves that fits the data Dan Gusfield. Efficient Algorithms for Inferring Evolutionary Trees. Networks 21(1):19 28, 1991 for each 1 i n do Let c i be the largest index such that M [i, c i ] = 1 Let (n j, n k ) be the edge labeled with c i if node n k is a leaf then Label node n k with i else Create a leaf node n l Make node n k the parent of node n l Label node n l with i end if end for Theorem The resulting tree T is a phylogenetic tree for the genomic matrix M Gabriel Valiente (ALBCOM) Bioinformatics / 86
48 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example D 4 3 B E A C Gabriel Valiente (ALBCOM) Bioinformatics / 86
49 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Test M for a phylogenetic tree) M = L = Π = ( ) L = ( ) M has a phylogenetic tree, because L(i, j) = L(j) for every (i, j) O Gabriel Valiente (ALBCOM) Bioinformatics / 86
50 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Test M for a phylogenetic tree) M = L = Π = ( ) L = ( ) M has no phylogenetic tree, because L(1, 2) L(2), and also because L(4, 3) L(3) Gabriel Valiente (ALBCOM) Bioinformatics / 86
51 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) n 4 n 3 n 2 n 1 L = ( ) n 5 Gabriel Valiente (ALBCOM) Bioinformatics / 86
52 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) L = ( ) n 4 4 n 3 n 2 1 n 1 5 n 5 Gabriel Valiente (ALBCOM) Bioinformatics / 86
53 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) L = ( ) n 4 4 n 3 3 r 2 n 2 1 n 1 5 n 5 Gabriel Valiente (ALBCOM) Bioinformatics / 86
54 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) L = ( ) D 4 n 3 3 r 2 n 2 1 n 1 5 C Gabriel Valiente (ALBCOM) Bioinformatics / 86
55 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) L = ( ) D 4 n 3 3 B r 2 n 2 1 E n 1 A 5 C Gabriel Valiente (ALBCOM) Bioinformatics / 86
56 Perfect phylogenies Given an n m genomic matrix M, the unique tree T with n leaves that fits the data, if it exists, can be reconstructed in O(nm) time using the previous algorithm Example (Build a phylogenetic tree T for M) M = Π = ( ) D 4 3 B E L = ( ) A C Gabriel Valiente (ALBCOM) Bioinformatics / 86
57 Perfect phylogenies The genomic matrix can be parsed from a Phylip file my $filename = $ARGV [ 0]; use Bio :: Matrix :: IO; my $parser = new Bio :: Matrix :: IO( -format => phylip, -file => $filename ); my $mat = $parser - > next_matrix ; The phylogenetic tree, if it exists, can be reconstructed using the previous algorithm Exercise Implement a Bio::Tree::PerfectPhylogeny module. Gabriel Valiente (ALBCOM) Bioinformatics / 86
58 Taxonomic reconstruction Compatibility Compatible trees with overlapping taxa can be combined into a single supertree containing the evolutionary information of the given trees. Incompatible trees do not admit their simultaneous inclusion into a common supertree. Two or more phylogenetic trees with nested taxa are ancestrally compatible if they can be refined into a common supertree. Two or more phylogenetic trees with nested taxa are perfectly compatible if there exists a common supertree whose topological restriction to the taxa in each tree is isomorphic to that tree. Philip Daniel, Charles Semple. Supertree Algorithms for Nested Taxa. In: Olaf R. P. Bininda-Emonds (ed.) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Computational Biology, vol. 4, chap. 7, pp Kluwer (2004). Charles Semple, Philip Daniel, Wim Hordijk, Roderic D. M. Page, Mike Steel. Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa. Bioinformatics 20(15), (2004). Gabriel Valiente (ALBCOM) Bioinformatics / 86
59 Taxonomic reconstruction Compatibility Example Two compatible phylogenetic trees obtained from TreeBASE. Loganiaceae Rubiaceae Viburnum Columellia Caprifoliaceae Viburnum Columellia Heptacodium Diervilla Linnaea Gabriel Valiente (ALBCOM) Bioinformatics / 86
60 Taxonomic reconstruction Compatibility Incompatible phylogenetic trees can still be partially combined into a maximum agreement subtree. Mike A. Steel, Tandy Warnow. Kaikoura Tree Theorems: Computing the Maximum Agreement Subtree. Information Processing Letters 48(2), (1993). Compatible phylogenetic trees can be combined into a common supertree. B. R. Baum. Combining Trees as a Way of Combining Datasets for Phylogenetic Inference, and the Desirability of Combining Gene Trees. Taxon 41(1), 3 10 (1992). M. A. Ragan. Phylogenetic Inference based on Matrix Representation of Trees. Molecular Phylogenetics and Evolution 1(1), (1992). Charles Semple, Mike A. Steel. A Supertree Method for Rooted Trees. Discrete Applied Mathematics 105(1 3), (2000). Roderic D. M. Page. Modified Mincut Supertrees. In: Proc. 2nd Int. Workshop Algorithms in Bioinformatics, Lecture Notes in Computer Science, vol. 2452, pp Springer-Verlag (2002). Gabriel Valiente (ALBCOM) Bioinformatics / 86
61 Taxonomic reconstruction Compatibility Definition Let A be a fixed set of labels. A node of a rooted tree with only one child is an elementary node. A semi-labeled tree over A is a rooted tree with some of its nodes, including all its leaves and all its elementary nodes, injectively labeled in the set A. An A-tree is a rooted tree with some of its nodes, including all its leaves, injectively labeled in the set A. The set of the labels of the leaves of an A-tree T is denoted by L(T ), and the set of the labels of all its nodes is denoted by A(T ). Example A semi-labeled tree (left) and an A-tree (right). X A H V C D A H V C D L L Gabriel Valiente (ALBCOM) Bioinformatics / 86
62 Taxonomic reconstruction Compatibility Definition Let T be an A-tree. For every v V (T ), the cluster of v in T is the set A T (v) of the labels of all its descendants, including itself. The cluster representation of T is C A (T ) = {A T (v) v V (T )}. Example The cluster representation of the A-tree A H V C D L is { {C}, {D}, {H}, {L}, {V }, {C, V }, {D, L}, {A, D, H, L}, {A, C, D, H, L, V } }. Gabriel Valiente (ALBCOM) Bioinformatics / 86
63 Taxonomic reconstruction Compatibility Definition The restriction T X of an A-tree T to a set X A of labels is the subtree of T supported on the set of nodes V (T X ) = {v V (T ) A(v) X } and where a node is labeled when it is labeled in T and this label belongs to X, in which case its label in T X is the same as in T. Example An A-tree (left) and its restriction to the set of labels {A, C, D} (right). V A H C D C L A D Gabriel Valiente (ALBCOM) Bioinformatics / 86
64 Taxonomic reconstruction Compatibility Theorem (Llabrés, Rocha, Rosselló, Valiente 2006) Let T 1 and T 2 be two A-trees with A(T 1 ) = A(T 2 ). Then, T 1 and T 2 are ancestrally compatible if and only if C A (T 1 ) and C A (T 2 ) satisfy jointly the following two conditions: For every A A(T 1 ) = A(T 2 ), the smallest member of C A (T 1 ) containing A is equal to the smallest member of C A (T 2 ) containing this label. For every X C A (T 1 ) and Y C A (T 2 ), if X Y, then X Y or Y X. Mercè Llabrés, Jairo Rocha, Francesc Rosselló, Gabriel Valiente. On the Ancestral Compatibility of Two Phylogenetic Trees with Nested Taxa. To appear in J. Math. Biol. (2006). Gabriel Valiente (ALBCOM) Bioinformatics / 86
65 Taxonomic reconstruction Compatibility Example Two incompatible phylogenetic trees obtained from TreeBASE. Convallaria Peliosanthes Geitonoplesium Phormium Herreria Asparagus Ruscus Uvularia Tricyrtis Trillium Alstroemeria Luzuriaga Philesia Dioscoreaceae Smilax Stemonaceae Ripogonum Petermannia Taccaceae Trillium Alstroemeria Tricyrtis Philesia Petermannia Taccaceae Dioscoreaceae Smilax Stemonaceae Ripogonum Uvularia Peliosanthes Convallaria Luzuriaga Geitonoplesium Herreria Phormium Asparagus Ruscus Gabriel Valiente (ALBCOM) Bioinformatics / 86
66 Taxonomic reconstruction Compatibility Example Two incompatible semi-labeled trees obtained from TreeBASE, one of which has a cluster labeled by a taxon in the other tree. Vernonia Asteroideae Blumea Inula Asteroideae Vernonia Inula Gnaphalium Antennaria Gabriel Valiente (ALBCOM) Bioinformatics / 86
67 Taxonomic reconstruction Compatibility Example Two compatible phylogenetic trees obtained from TreeBASE. Loganiaceae Rubiaceae Viburnum Columellia Caprifoliaceae Viburnum Columellia Heptacodium Diervilla Linnaea Example Two incompatible semi-labeled trees obtained from TreeBASE, in which an incompatible triple of labels involves three taxa in one tree and two taxa plus one internal label in the other tree. Loganiaceae Rubiaceae Viburnum Columellia Caprifoliaceae Caprifoliaceae Viburnum Columellia Heptacodium Diervilla Linnaea Gabriel Valiente (ALBCOM) Bioinformatics / 86
68 Taxonomic reconstruction Compatibility Algorithm (Ancestral compatibility) A := A(T 1 ) A(T 2 ); T 1 := T 1 A; T 2 := T 2 A for each label A A do let X 1 be the smallest member of C A ( T 1 ) containing A let X 2 be the smallest member of C A ( T 2 ) containing A if X 1 X 2 then return X 1 and X 2 are incompatible end if end for for each cluster X 1 C A ( T 1 ) do for each cluster X 2 C A ( T 2 ) do if X 1 X 2 and X 1 X 2 and X 2 X 1 then return X 1 and X 2 are incompatible end if end for end for return T 1 and T 2 are compatible Gabriel Valiente (ALBCOM) Bioinformatics / 86
69 Taxonomic reconstruction Compatibility The A-trees can be parsed from a Phylip file, and they can be tested for ancestral compatibility use Bio :: Tree :: Compatible ; use Bio :: TreeIO ; my $filename = $ARGV [ 0]; my $input = new Bio :: TreeIO ( -format => newick, -file => $filename ); my $t1 = $input - > next_tree ; my $t2 = $input - > next_tree ; my ( $incompat, $ilabels, $inodes ) = $t1 -> Bio :: Tree :: Compatible :: is_compatible ( $t2 ); Gabriel Valiente (ALBCOM) Bioinformatics / 86
70 Taxonomic reconstruction Compatibility The cluster representation of the trees is the basis for a certificate of incompatibility if ( $incompat ) { print " the trees are incompatible \n"; my % cluster1 = %{ $t1 -> Bio :: Tree :: Compatible :: cluster_representation }; my % cluster2 = %{ $t2 -> Bio :: Tree :: Compatible :: cluster_representation }; Gabriel Valiente (ALBCOM) Bioinformatics / 86
71 Taxonomic reconstruction Compatibility T 1 and T 2 are incompatible if for some label A A(T 1 ) = A(T 2 ), the smallest member of C A (T 1 ) and of C A (T 2 ) containing A differ. if ( scalar )) { foreach my $label ) { my $n1 = $t1 -> find_node (-id => $label ); my $n2 = $t2 -> find_node (-id => $label ); = $cluster1 { $n1 } }; = $cluster2 { $n2 } }; print " label $label "; print " cluster "; map { print " ",$_ ; print " cluster "; map { print " ",$_ ; print "\n"; } } Gabriel Valiente (ALBCOM) Bioinformatics / 86
72 Taxonomic reconstruction Compatibility T 1 and T 2 are incompatible if for some X C A (T 1 ) and Y C A (T 2 ), clusters X and Y overlap but none is contained in the other. if ( scalar )) { while ) { my $n1 = ; my $n2 = ; = $cluster1 { $n1 } }; = $cluster2 { $n2 } }; print " cluster "; map { print " ",$_ ; print " properly intersects cluster "; map { print " ",$_ ; print "\n"; } } } else { print " the trees are compatible \n"; } Gabriel Valiente (ALBCOM) Bioinformatics / 86
73 Taxonomic reconstruction Compatibility Lemma The ancestral compatibility algorithm takes time quadratic in the size of the trees. Proof. The size of the cluster representation is bounded by the size of the tree, and the bound is tight in the worst case. Exercise Improve the ancestral compatibility algorithm to run in time linear in the size of the trees. Exercise Implement a linear time Bio::Tree::Compatible module. Gabriel Valiente (ALBCOM) Bioinformatics / 86
74 Taxonomic reconstruction Consensus Definition A path (v 0, v 1,..., v k ) in an A-tree T is elementary if, for every i = 1,..., k 1, v i+1 is the only child of v i ; in other words, if all its intermediate nodes have out-degree 1. In particular, an arc forms an elementary path. Example Path 1 5 is not elementary. Gabriel Valiente (ALBCOM) Bioinformatics / 86
75 Taxonomic reconstruction Consensus Definition Two non-trivial paths (a, v 1,..., v k ) and (a, w 1,..., w l ) in an A-tree T are said to diverge if their origin a is their only common node. Example Paths 1 5 and 1 6 do not diverge. Gabriel Valiente (ALBCOM) Bioinformatics / 86
76 Taxonomic reconstruction Consensus Definition An A-tree S is a minor of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: for every a, b V (S), if (a, b) E(S), then there exists a path f (a) f (b) in T with no intermediate node in f (V (S)). In this case, the mapping f is said to be a minor embedding f : S T. Example y S r T 1 x The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 3 and f (y) = 4 is not a minor embedding, because, although it transforms arcs in S into paths in T, the path f (r) f (y) contains the node 3 = f (x), which belongs to f (V (S)). Gabriel Valiente (ALBCOM) Bioinformatics / 86
77 Taxonomic reconstruction Consensus Definition An A-tree S is a minor of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: for every a, b V (S), if (a, b) E(S), then there exists a path f (a) f (b) in T with no intermediate node in f (V (S)). In this case, the mapping f is said to be a minor embedding f : S T. Example y S r T 1 x The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 5 and f (y) = 6 is a minor embedding, because the arcs (r, x), (r, y) E(S) become paths f (r) f (x) and f (r) f (y) in T with no intermediate node in f (V (S)). Gabriel Valiente (ALBCOM) Bioinformatics / 86
78 Taxonomic reconstruction Consensus Definition An A-tree S is a minor of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: for every a, b V (S), if (a, b) E(S), then there exists a path f (a) f (b) in T with no intermediate node in f (V (S)). In this case, the mapping f is said to be a minor embedding f : S T. Example y S r T 1 x The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 5 and f (y) = 6 is not a topological embedding, because these paths f (r) f (x) and f (r) f (y) do not diverge. Gabriel Valiente (ALBCOM) Bioinformatics / 86
79 Taxonomic reconstruction Consensus Definition An A-tree S is a topological subtree of an A-tree T if there exists a minor embedding f : S T such that, for every (a, b), (a, c) E(S) with b c, the paths f (a) f (b) and f (a) f (c) in T diverge. In this case, f is called a topological embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 6 is a topological embedding, because the arcs (r, x), (r, y) E(S) become divergent paths f (r) f (x) and f (r) f (y) in T without intermediate nodes in f (V (S)). Gabriel Valiente (ALBCOM) Bioinformatics / 86
80 Taxonomic reconstruction Consensus Definition An A-tree S is a topological subtree of an A-tree T if there exists a minor embedding f : S T such that, for every (a, b), (a, c) E(S) with b c, the paths f (a) f (b) and f (a) f (c) in T diverge. In this case, f is called a topological embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 6 is not a homeomorphic embedding, because the path f (r) f (y) contains an intermediate node with more than one child. Gabriel Valiente (ALBCOM) Bioinformatics / 86
81 Taxonomic reconstruction Consensus Definition An A-tree S is a homeomorphic subtree of an A-tree T if there exists a minor embedding f : S T satisfying the following extra condition: for every (a, b) E(S), the path f (a) f (b) in T is elementary. In this case, f is said to be a homeomorphic embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 4 is a homeomorphic embedding, because the arcs (r, x), (r, y) E(S) become elementary paths f (r) f (x) and f (r) f (y) in T with no intermediate node in f (V (S)). Gabriel Valiente (ALBCOM) Bioinformatics / 86
82 Taxonomic reconstruction Consensus Definition An A-tree S is a homeomorphic subtree of an A-tree T if there exists a minor embedding f : S T satisfying the following extra condition: for every (a, b) E(S), the path f (a) f (b) in T is elementary. In this case, f is said to be a homeomorphic embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 4 is not an isomorphic embedding, because the path f (r) f (y) is not an arc. Gabriel Valiente (ALBCOM) Bioinformatics / 86
83 Taxonomic reconstruction Consensus Definition An A-tree S is an isomorphic subtree of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: if (a, b) E(S), then (f (a), f (b)) E(T ). Such a mapping f is called an isomorphic embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 1, f (x) = 2 and f (y) = 3 is an isomorphic embedding, because it transforms every arc in S into an arc in T. Gabriel Valiente (ALBCOM) Bioinformatics / 86
84 Taxonomic reconstruction Consensus Definition An A-tree S is an isomorphic subtree of an A-tree T if there exists an injective mapping f : V (S) V (T ) satisfying the following condition: if (a, b) E(S), then (f (a), f (b)) E(T ). Such a mapping f is called an isomorphic embedding f : S T. Example y S r x T The mapping f : V (S) V (T ) defined by f (r) = 4, f (x) = 5 and f (y) = 6 is an isomorphic embedding, because it transforms every arc in S into an arc in T. Gabriel Valiente (ALBCOM) Bioinformatics / 86
85 Taxonomic reconstruction Consensus Remark Minor, topological, homeomorphic, and isomorphic embeddings of A-trees are injective mappings satisfying the additional condition that node labels are preserved and reflected. Example T 1 S T 2 A B C D E F A B D E A B F D E C There are isomorphic embeddings S T 1 and S T 2. Gabriel Valiente (ALBCOM) Bioinformatics / 86
86 Taxonomic reconstruction Consensus Remark Minor, topological, homeomorphic, and isomorphic embeddings of A-trees are injective mappings satisfying the additional condition that node labels are preserved and reflected. Example T 1 S T 2 A B C D E F A B D E A B F D E C There are topological embeddings S T 1 and S T 2. Gabriel Valiente (ALBCOM) Bioinformatics / 86
87 Taxonomic reconstruction Consensus Lemma The size M(T 1, T 2 ) of a largest common topological subtree of two A-trees T 1 and T 2 can be computed in O(n 4.5 log n) time. Proof. Mike Steel, Tandy Warnow. Kaikoura Tree Theorems: Computing the Maximum Agreement Subtree. Inform. Process. Lett. 48(2), (1993). Gabriel Valiente (ALBCOM) Bioinformatics / 86
88 Taxonomic reconstruction Consensus Remark The size M(T 1, T 2 ) of a largest common topological subtree of two binary A-trees T 1 and T 2 follows a simple recurrence. T 1 v w T 2 a b c d M(T 1 [v], T 2 [w]) is the size of L(T 1 [v]) L(T 2 [w]) if T 1 [v] or T 2 [w] is a singleton, otherwise M(T 1 [v], T 2 [w]) = max M(T 1 [a], T 2 [c]) + M(T 1 [b], T 2 [d]) M(T 1 [a], T 2 [d]) + M(T 1 [b], T 2 [c]) M(T 1 [a], T 2 [w]) M(T 1 [b], T 2 [w]) M(T 1 [v], T 2 [c]) M(T 1 [v], T 2 [d]) Gabriel Valiente (ALBCOM) Bioinformatics / 86
89 Taxonomic reconstruction Consensus Lemma The size M(T 1, T 2 ) of a largest common topological subtree of two A-trees T 1 and T 2 can be computed in O(n 2 ) time. Proof. Martin Farach, Mikkel Thorup. Fast Comparison of Evolutionary Trees. Inform. Comput. 123(1), (1995). Exercise Implement an efficient Bio::Tree::Agreement module. Gabriel Valiente (ALBCOM) Bioinformatics / 86
90 Taxonomic reconstruction Combination Remark Let x denote isomorphic, homeomorphic, topological, or minor. Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. Francesc Rosselló, Gabriel Valiente. An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees. To appear in Theoret. Comput. Sci. (2006). Gabriel Valiente (ALBCOM) Bioinformatics / 86
91 Taxonomic reconstruction Combination Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. T σ T T po T 1 T 2 T 1 T 2 T p T µ Gabriel Valiente (ALBCOM) Bioinformatics / 86
92 Taxonomic reconstruction Combination Definition The intersection of two trees T 1 and T 2 obtained through minor embeddings f 1 : T 1 T and f 2 : T 2 T into a tree T, is the graph T p with set of nodes V (T p ) = V (T 1 ) V (T 2 ) and set of arcs defined in the following way: for every a, b V (T 1 ) V (T 2 ), (a, b) E(T p ) if and only if there are paths a b in T 1 and in T 2 without intermediate nodes in V (T 1 ) V (T 2 ). Example Let T be a tree with nodes a 1, a 2, b, c and arcs (a 1, a 2 ), (a 2, b), (a 2, c), let T 1 be its minor with nodes a 1, b, c and arcs (a 1, b), (a 1, c), and let T 2 be its minor with nodes a 2, b, c and arcs (a 2, b), (a 2, c). In this case T p is the graph with nodes b, c and no arc, and in particular it is not a tree. Gabriel Valiente (ALBCOM) Bioinformatics / 86
93 Taxonomic reconstruction Combination Definition The intersection of two trees T 1 and T 2 obtained through minor embeddings f 1 : T 1 T and f 2 : T 2 T into a tree T, is the graph T p with set of nodes V (T p ) = V (T 1 ) V (T 2 ) and set of arcs defined in the following way: for every a, b V (T 1 ) V (T 2 ), (a, b) E(T p ) if and only if there are paths a b in T 1 and in T 2 without intermediate nodes in V (T 1 ) V (T 2 ). Example a 1 T T 1 a 1 a 2 T 2 T p a 2 b c b c b c b c Gabriel Valiente (ALBCOM) Bioinformatics / 86
94 Taxonomic reconstruction Combination Theorem For every two trees T 1 and T 2, any intersection of T 1 and T 2 obtained through x-embeddings into a smallest common x-supertree of them is a largest common x-subtree of T 1 and T 2. Corollary Every largest common x-supertree of a pair of trees T 1 and T 2 is, up to an isomorphism, the intersection of T 1 and T 2 obtained through their embeddings into a smallest common x-supertree. Gabriel Valiente (ALBCOM) Bioinformatics / 86
95 Taxonomic reconstruction Combination Definition The x-join of two trees T 1 and T 2 obtained through x-embeddings m 1 : T µ T 1 and m 2 : T µ T 2 of a largest common x-subtree T µ of them, is the quotient graph T po of the disjoint sum T 1 + T 2 by the equivalence relation θ defined, up to symmetry, by the following condition: (a, b) θ if and only if a = b or there exists some c V (T µ ) such that a = m 1 (c) and b = m 2 (c). Example Let T µ be the graph with nodes b, c and no arc, let T 1 be the tree with nodes a 1, b, c and arcs (a 1, b), (a 1, c), and let T 2 be the tree with nodes a 2, b, c and arcs (a 2, b), (a 2, c). The x-join T po of T 1 and T 2 through the obvious embeddings of T µ is the graph with nodes a 1, a 2, b, c and arcs (a 1, b), (a 1, c), (a 2, b), (a 2, c), but it is not a smallest common x-supertree of T 1 and T 2, because T µ is not a largest common x-subtree of T 1 and T 2 (it is not even a tree). Gabriel Valiente (ALBCOM) Bioinformatics / 86
96 Taxonomic reconstruction Combination Definition The x-join of two trees T 1 and T 2 obtained through x-embeddings m 1 : T µ T 1 and m 2 : T µ T 2 of a largest common x-subtree T µ of them, is the quotient graph T po of the disjoint sum T 1 + T 2 by the equivalence relation θ defined, up to symmetry, by the following condition: (a, b) θ if and only if a = b or there exists some c V (T µ ) such that a = m 1 (c) and b = m 2 (c). Example T µ T 1 a 1 a 2 T po b c b c T 2 a 1 b c a 2 b c Gabriel Valiente (ALBCOM) Bioinformatics / 86
97 Taxonomic reconstruction Combination Definition The x-sum of two trees T 1 and T 2 obtained through x-embeddings m 1 : T µ T 1 and m 2 : T µ T 2 of a largest common x-subtree T µ of them, is the graph T σ obtained from the x-join T po of T 1 and T 2 by removing every arc that is subsumed by a path: that is, we remove from T po each arc (v, w) for which there is another path v w in T po. Example T 2 T po T σ T µ T 1 a a a a a b c b c b c d e d e d e d e d e Gabriel Valiente (ALBCOM) Bioinformatics / 86
98 Taxonomic reconstruction Combination Theorem For every pair of trees T 1 and T 2, any x-sum of T 1 and T 2 is a smallest common x-supertree of them. Corollary Every smallest common x-supertree of a pair of trees T 1 and T 2 is, up to an isomorphism, the x-sum of T 1 and T 2 obtained through the embeddings of a largest common x-subtree into them. Gabriel Valiente (ALBCOM) Bioinformatics / 86
99 Taxonomic reconstruction Combination Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. Given a largest common x-subtree T µ of two trees T 1 and T 2 and a pair of witness x-embeddings m 1 : T µ T 1 and m 2 : T µ T 2, a smallest common x-supertree T σ of T 1 and T 2 can be obtained in time linear in the size of T 1 and T 2, as follows. Gabriel Valiente (ALBCOM) Bioinformatics / 86
100 Taxonomic reconstruction Combination Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. First, make copies T 1 and T 2 of T 1 and T 2, with l 1 : T 1 T 1 and l 2 : T 2 T 2 identity mappings. Second, sum up T 1 and T 2 into a graph T σ. Third, for each a V (T µ ), merge nodes l 1 (m 1 (a)) and l 2 (m 2 (a)), and remove all parallel arcs. Gabriel Valiente (ALBCOM) Bioinformatics / 86
101 Taxonomic reconstruction Combination Theorem The problems of finding a largest common x-subtree and a smallest common x-supertree of two trees, in each case together with a pair of witness x-embeddings, are reducible to each other in time linear in the size of the trees. Proof. Next, remove from T σ all arcs subsumed by paths, as follows. For each node y V (T σ ) of in-degree 2, let x, x V (T σ ) be the source nodes of the two arcs coming into y. Now, perform a simultaneous traversal of the paths of arcs coming into x and x, until reaching node x along the first path or x along the second path. The simultaneous traversal of incoming paths may stop along either path, but continue along the other one, because a node of in-degree 0 or in-degree 2 is reached. Finally, remove from T σ either arc (x, y), if node x was reached along the first path, or arc (x, y), if node x was reached along the second path. Gabriel Valiente (ALBCOM) Bioinformatics / 86
Evolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationAn Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees
An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute
More informationTHE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT
COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.
More informationCS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003
CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 5
CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC
More informationCharles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel
SUPERTREE ALGORITHMS FOR ANCESTRAL DIVERGENCE DATES AND NESTED TAXA Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel Department of Mathematics and Statistics University of Canterbury
More informationNJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees
NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana
More informationSupertree Algorithms for Ancestral Divergence Dates and Nested Taxa
Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa Charles Semple 1, Philip Daniel 1, Wim Hordijk 1, Roderic D. M. Page 2, and Mike Steel 1 1 Biomathematics Research Centre, Department
More informationPhylogenetic trees 07/10/13
Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary
CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch
More informationA Phylogenetic Network Construction due to Constrained Recombination
A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer
More informationPhylogenetic Networks, Trees, and Clusters
Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University
More informationPhylogeny: traditional and Bayesian approaches
Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent
More informationInferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies
Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa
More informationTheory of Evolution Charles Darwin
Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationLet S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT
More informationPhylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.
Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony
More informationRECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION
RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,
More informationPhylogeny: building the tree of life
Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan
More informationThe Generalized Neighbor Joining method
The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to
More informationPlan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method
Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary
More informationPhylogenetic Networks with Recombination
Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange
More informationProperties of normal phylogenetic networks
Properties of normal phylogenetic networks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu August 13, 2009 Abstract. A phylogenetic network is
More informationTheory of Evolution. Charles Darwin
Theory of Evolution harles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (8-6) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationMultiple Sequence Alignment. Sequences
Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe
More informationarxiv: v1 [cs.cc] 9 Oct 2014
Satisfying ternary permutation constraints by multiple linear orders or phylogenetic trees Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz May 7, 08 arxiv:40.7v [cs.cc] 9 Oct 04 Abstract A ternary
More informationRECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS
RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS KT Huber, V Moulton, C Semple, and M Steel Department of Mathematics and Statistics University of Canterbury Private Bag 4800 Christchurch,
More informationWhat is Phylogenetics
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)
More informationMolecular Evolution and Phylogenetic Tree Reconstruction
1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationX X (2) X Pr(X = x θ) (3)
Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree
More informationConsistency Index (CI)
Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More informationEvolutionary trees. Describe the relationship between objects, e.g. species or genes
Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies The evolutionary relationships between
More informationarxiv: v1 [cs.ds] 21 May 2013
Easy identification of generalized common nested intervals Fabien de Montgolfier 1, Mathieu Raffinot 1, and Irena Rusu 2 arxiv:1305.4747v1 [cs.ds] 21 May 2013 1 LIAFA, Univ. Paris Diderot - Paris 7, 75205
More informationNOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS
NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS PETER J. HUMPHRIES AND CHARLES SEMPLE Abstract. For two rooted phylogenetic trees T and T, the rooted subtree prune and regraft distance
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationParsimony via Consensus
Syst. Biol. 57(2):251 256, 2008 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150802040597 Parsimony via Consensus TREVOR C. BRUEN 1 AND DAVID
More informationA (short) introduction to phylogenetics
A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field
More informationNeighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances
Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances Ilan Gronau Shlomo Moran September 6, 2006 Abstract Reconstructing phylogenetic trees efficiently and accurately from distance estimates
More informationPhylogeny Tree Algorithms
Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationHierarchical Clustering
Hierarchical Clustering Some slides by Serafim Batzoglou 1 From expression profiles to distances From the Raw Data matrix we compute the similarity matrix S. S ij reflects the similarity of the expression
More informationarxiv: v1 [cs.ds] 1 Nov 2018
An O(nlogn) time Algorithm for computing the Path-length Distance between Trees arxiv:1811.00619v1 [cs.ds] 1 Nov 2018 David Bryant Celine Scornavacca November 5, 2018 Abstract Tree comparison metrics have
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationTheDisk-Covering MethodforTree Reconstruction
TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document
More informationRestricted trees: simplifying networks with bottlenecks
Restricted trees: simplifying networks with bottlenecks Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu February 17, 2011 Abstract. Suppose N
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationarxiv: v1 [q-bio.pe] 1 Jun 2014
THE MOST PARSIMONIOUS TREE FOR RANDOM DATA MAREIKE FISCHER, MICHELLE GALLA, LINA HERBST AND MIKE STEEL arxiv:46.27v [q-bio.pe] Jun 24 Abstract. Applying a method to reconstruct a phylogenetic tree from
More informationBioinformatics tools for phylogeny and visualization. Yanbin Yin
Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and
More informationA 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction
A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES MAGNUS BORDEWICH 1, CATHERINE MCCARTIN 2, AND CHARLES SEMPLE 3 Abstract. In this paper, we give a (polynomial-time) 3-approximation
More informationHaplotyping as Perfect Phylogeny: A direct approach
Haplotyping as Perfect Phylogeny: A direct approach Vineet Bafna Dan Gusfield Giuseppe Lancia Shibu Yooseph February 7, 2003 Abstract A full Haplotype Map of the human genome will prove extremely valuable
More informationSTRUCTURAL BIOINFORMATICS I. Fall 2015
STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C;
More information1.1 The (rooted, binary-character) Perfect-Phylogeny Problem
Contents 1 Trees First 3 1.1 Rooted Perfect-Phylogeny...................... 3 1.1.1 Alternative Definitions.................... 5 1.1.2 The Perfect-Phylogeny Problem and Solution....... 7 1.2 Alternate,
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationA phylogenomic toolbox for assembling the tree of life
A phylogenomic toolbox for assembling the tree of life or, The Phylota Project (http://www.phylota.org) UC Davis Mike Sanderson Amy Driskell U Pennsylvania Junhyong Kim Iowa State Oliver Eulenstein David
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationMath 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.
Math 239: Discrete Mathematics for the Life Sciences Spring 2008 Lecture 14 March 11 Lecturer: Lior Pachter Scribe/ Editor: Maria Angelica Cueto/ C.E. Csar 14.1 Introduction The goal of today s lecture
More informationEvolutionary trees. Describe the relationship between objects, e.g. species or genes
Evolutionary trees Bonobo Chimpanzee Human Neanderthal Gorilla Orangutan Describe the relationship between objects, e.g. species or genes Early evolutionary studies Anatomical features were the dominant
More informationThe Complexity of Constructing Evolutionary Trees Using Experiments
The Complexity of Constructing Evolutionary Trees Using Experiments Gerth Stlting Brodal 1,, Rolf Fagerberg 1,, Christian N. S. Pedersen 1,, and Anna Östlin2, 1 BRICS, Department of Computer Science, University
More informationInteger Programming for Phylogenetic Network Problems
Integer Programming for Phylogenetic Network Problems D. Gusfield University of California, Davis Presented at the National University of Singapore, July 27, 2015.! There are many important phylogeny problems
More informationPhylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science
Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.
More informationReconstructing Trees from Subtree Weights
Reconstructing Trees from Subtree Weights Lior Pachter David E Speyer October 7, 2003 Abstract The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationRegular networks are determined by their trees
Regular networks are determined by their trees Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu February 17, 2009 Abstract. A rooted acyclic digraph
More informationMichael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D
7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood
More informationBuilding Phylogenetic Trees UPGMA & NJ
uilding Phylogenetic Trees UPGM & NJ UPGM UPGM Unweighted Pair-Group Method with rithmetic mean Unweighted = all pairwise distances contribute equally. Pair-Group = groups are combined in pairs. rithmetic
More informationEven Cycles in Hypergraphs.
Even Cycles in Hypergraphs. Alexandr Kostochka Jacques Verstraëte Abstract A cycle in a hypergraph A is an alternating cyclic sequence A 0, v 0, A 1, v 1,..., A k 1, v k 1, A 0 of distinct edges A i and
More informationReconstruction of certain phylogenetic networks from their tree-average distances
Reconstruction of certain phylogenetic networks from their tree-average distances Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA swillson@iastate.edu October 10,
More informationDNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi
DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :
More informationIntegrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley
Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;
More informationChapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships
Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic
More informationInteger Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.!
Integer Programming in Computational Biology D. Gusfield University of California, Davis Presented December 12, 2016. There are many important phylogeny problems that depart from simple tree models: Missing
More informationPerfect Phylogenetic Networks with Recombination Λ
Perfect Phylogenetic Networks with Recombination Λ Lusheng Wang Dept. of Computer Sci. City Univ. of Hong Kong 83 Tat Chee Avenue Hong Kong lwang@cs.cityu.edu.hk Kaizhong Zhang Dept. of Computer Sci. Univ.
More informationJed Chou. April 13, 2015
of of CS598 AGB April 13, 2015 Overview of 1 2 3 4 5 Competing Approaches of Two competing approaches to species tree inference: Summary methods: estimate a tree on each gene alignment then combine gene
More informationPhylogeny. November 7, 2017
Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related
More informationSolving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.
Solving the Maximum Agreement Subtree and Maximum Compatible Tree problems on bounded degree trees LIRMM, Montpellier France 4th July 2006 Introduction The Mast and Mct problems: given a set of evolutionary
More informationSupplementary Information
Supplementary Information For the article"comparable system-level organization of Archaea and ukaryotes" by J. Podani, Z. N. Oltvai, H. Jeong, B. Tombor, A.-L. Barabási, and. Szathmáry (reference numbers
More informationChapter 19: Taxonomy, Systematics, and Phylogeny
Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand
More informationGel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1
Gel Electrophoresis Used to measure the lengths of DNA fragments. When voltage is applied to DNA, different size fragments migrate to different distances (smaller ones travel farther). 10/28/0310/21/2003
More informationImproving divergence time estimation in phylogenetics: more taxa vs. longer sequences
Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal
More informationmolecular evolution and phylogenetics
molecular evolution and phylogenetics Charlotte Darby Computational Genomics: Applied Comparative Genomics 2.13.18 https://www.thinglink.com/scene/762084640000311296 Internal node Root TIME Branch Leaves
More informationAlgebraic Statistics Tutorial I
Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =
More information5 Quiver Representations
5 Quiver Representations 5. Problems Problem 5.. Field embeddings. Recall that k(y,..., y m ) denotes the field of rational functions of y,..., y m over a field k. Let f : k[x,..., x n ] k(y,..., y m )
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationPhylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz
Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels
More informationInvestigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST
Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and
More informationThe expected value of the squared euclidean cophenetic metric under the Yule and the uniform models
The expected value of the squared euclidean cophenetic metric under the Yule and the uniform models Gabriel Cardona, Arnau Mir, Francesc Rosselló Department of Mathematics and Computer Science, University
More informationLetter to the Editor. Department of Biology, Arizona State University
Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona
More informationComparative Genomics II
Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods
More informationReading for Lecture 13 Release v10
Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................
More information