Visualising Phylogenetic Trees

Size: px
Start display at page:

Download "Visualising Phylogenetic Trees"

Transcription

1 Visualising Phylogenetic Trees Wan Nazmee Wan Zainon & Paul Calder School of Informatics and Engineering Flinders University of South Australia PO Box 21, Adelaide 51, South Australia Abstract This paper describes techniques for visualising pairs of similar trees. Our aim is to develop ways of presenting the information so as to highlight both the common structure of the trees and their points of difference. The impetus for the work comes from the field of bioinformatics, where geneticists construct complex phylogenetic trees to represent the evolution of species or genes. But the techniques can also be used for other treestructured data such as file systems, parse trees, decision trees, and organisational hierarchies. To investigate our techniques, we have built a prototype application that reads and displays phylogenetic trees in the popular Nexus format. The application incorporates a variety of interactive and automated visualisation techniques, and is implemented in Java. We are working with biologists to see how well the techniques work for real-world data. Keywords: Interactive visualisation, phylogenetic trees, bioinformatics. 1 Introduction Tree-structured data occurs in many domains: file systems, parse trees, organisational hierarchies, and classification schemes of many kinds. The impetus for this work described in this paper is the domain of phylogenetic classification, which is used by geneticists to describe possible evolutionary relationships between species or individuals. Although we have developed our techniques specifically for that domain, many of our techniques could also be applied to other domains that use similar trees. This paper presents techniques for visualising pairs of phylogenetic trees in order to emphasise the similarity of the trees while also highlighting how they differ. We have implemented these techniques in the context of a prototype tool for interactively visualising phylogenetic trees, and are in the process of evaluating the effectiveness of the tool for real phylogenetic data. Copyright 26, Australian Computer Society, Inc. This paper appeared at the Seventh Australasian User Interface Conference (AUIC26), Hobart, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 5. Wayne Piekarski, Ed. Reproduction for academic, notfor profit purposes permitted provided this text is included. The remainder of this paper is organised as follows. Section 2 provides an introduction to the bioinformatics basis of phylogenetic trees and outlines other work that has investigated the visualisation and comparison of such data. Section 3 presents our approach to the problem and details several of the algorithms we use to compute visualisations. Section 4 describes brief implementation details for the prototype visualisation tool, shows examples of its interface and discusses its use. 2 Related Work 2.1 Bioinformatics Context Biologists and geneticists use phylogenetic trees to represent the evolutionary interrelationships between collections of related species or genes. The discovery and analysis of those relationships may help in many practical applications such as drug discovery, forensics, disease control, and ecological modelling. Biologists construct phylogenetic trees by examining the phenotypes or genotypes of a collection of organisms and attempting to infer the evolutionary process by which the organisms came to be. For example, a geneticist might obtain DNA sequence data from a range of species or from individuals within a population. Then, by comparing the sequences, she could infer how the sampled organisms might have evolved via a series of mutations, each caused by one change in the DNA sequence. This hypothesised evolutionary history is then represented as a tree of life showing how possible ancestors could have led to the current organisms. Bioinformaticists have devised a range of algorithms, based on strategies such as Maximum Likelihood (Felsenstein et al. 1982) and Maximum Parsimony (Farris 1983), for computing such phylogenetic trees. However, there is no gold standard ; current practice dictates that several different methods be applied to the sequence data (Thorup 1994). When this happen, biologists often need to compare several similar trees in order to get a more complete picture of the relationships involved. A similar situation arises when several species have evolved in close association (co-evolution); the biologist might be interested in understanding how the phylogenetic tree for one species compares with that for the co-evolved species. In its simplest form, a phylogenetic tree is drawn as a rooted binary tree. Each leaf node represents an actual species or organism; each internal node represents a hypothetical ancestor at which mutation is assumed to

2 Tree 1 Tree 2 Figure 1: Fictitious phylogenetic trees have occurred (and which therefore has exactly two branches). For example, Figure 1 shows two (clearly fictitious) trees that suggest two possible ways in which 4 present-day species might be related. Tree 1 implies that and diverged recently from a common ancestor, that the / ancestor and share a more distant common ancestry, and finally that the whole // tree split from the branch even further in the past. Tree 2, on the other hand, suggests that a common ancestor split into two branches, one ultimately leading to and and the other to and. Real phylogenetic trees will of course be much larger and thus more complex. Understanding such trees requires visual inspection, structural comparison, and interactive manipulation and exploration, and thus present a number of visualisation challenges (Carrizo 24). Biologists faced with inadequate visualisation tools for comparing trees have had to rely instead on paper, tape, and highlighter pens (Munzner et al. 23). 2.2 Tree Comparison Techniques Bioinformaticists use a variety of techniques to compare phylogenetic trees. Section 3.1 describes how we apply and extend some of these techniques in visualising trees. Consensus trees are widely used to summarise the agreement between a set of trees. A consensus tree represents a lowest common denominator of two or more trees; it depicts those aspects that the individual trees all agree on. Bryant (1997) reports on a variety of methods for creating a consensus tree, including the strict, majority rule, semistrict, and Nelson and Adams techniques. For example, the strict consensus tree of the trees in Figure 1 is as follows: The consensus tree indicates that both trees agree that and had a recent common ancestor, but disagree about how and fit into the picture. The best that can be said is that and both shared a common ancestor with the / ancestor at some time in the past. Note that a consensus tree includes all of the original leaf nodes, but is normally not fully resolved; areas of disagreement generally result in interior nodes with more than two branches. An agreement subtree is a subtree that is common to two or more trees. Conceptually, a subtree can be obtained by pruning leaf nodes (and collapsing the parent internal nodes) from the original tree. An agreement subtree is a subtree that can be extracted in such a manner from all of the trees. A greatest agreement subtree (GAS) is an agreement subtree with the greatest number of leaf nodes. For example, the trees in Figure 1 have two greatest agreement subtrees: Note that a greatest agreement subtree does not normally include all of the leaf nodes (unless, of course, the trees are identical). A triplet is a 3-node subtree and represents the smallest informative subtree of a rooted tree. The structure of a tree is fully characterised by enumerating the structure of its triplets. For example, Tree 1 has the following triplets: Triplets can be used as a basis for quantifying the difference between rooted trees. Using this approach, the structural difference between two trees is the number of triplets whose structure is different in the two trees. For example, Tree 2 has the following triplets. Since 2 triplets (the second and third) are different from the corresponding triplets in Tree 1, the structural triplet difference between the two trees is 2. The nearest neighbour interchange (NNI) technique (Robinson 1971) is also used to quantify the difference between trees. A nearest neighbour interchange is an interchange of two nearest neighbour branches. The NNI difference between two trees is the minimum number of such interchanges needed to convert one tree into the other. NNI is usually applied to unrooted trees, but can be adapted for rooted trees. For rooted trees, the nearest neighbour of a branch is one of the sub-branches (if they exist) of its sibling. For example, in Tree 1 the nearest neighbours of the branch are the and branches, and the nearest neighbours of the branch are the branch and the (unlabelled) common / ancestor branch.

3 Program PROTPARS NEIGHBOR DRAWGRAM CONSENSE RETREE Use Infers phylogenies from protein sequences using parsimony method Infers phylogenies from distance matrix data using either pairwise clustering or neighbour joining methods Draws a rooted tree based on output from one of the phylogeny inference programs Computes a consensus tree from a group of phylogenies Allows interactive manipulation of a tree Table 1: Selected PHYLIP programs Using this definition, the following trees can all be obtained by one NNI step from Tree 1. Since one of these (the bottom right) is structurally identical with Tree 2, the NNI difference between the two trees is Tools for Phylogenetic Tree Analysis and Visualisation Biologists use many applications to analyse and understand phylogenetic data. This section briefly describes four of the most popular tools that are freely available over the Internet. A comprehensive list of other tools is provided on the PHYLIP web site (Felsenstein 25) Gibas and Jambeck (21) report that the most widely used phylogenetic analysis package is PHYLIP (Felsenstein 25), which contains more than 3 programs that implement different phylogenetic algorithms. It has programs for tree plotting, heuristic tree search, interactive tree manipulation, and other phylogenetic analysis methods. Table 1 shows a list of PHYLIP programs that users are most likely to use to analyse protein and DNA sequence data. The COMPONENT application (Roderic 1993) can both display and analyse phylogenetic trees. It s emphasis is on computing comparative metrics between trees, although it includes simple interactive editing operations such as rearranging tree branches, deleting nodes, and rerooting trees. Mesquite (Madison and Madison 25) is a system that its developers describe as a modular system for evolutionary analysis. Available modules include components for construction and comparison of phylogenetic trees. The TreeSet Visualisation module (Klinger and Amenta 22) produces point-set visualisations that suggest clustering within large sets of trees. TreeJuxtaposer (Munzner at al. 23) supports structural comparison of trees. The tool can highlight parts of several trees that are structurally similar, although its emphasis is on efficiently handling very large trees (up to several hundred thousand nodes) rather than on identifying the specific differences between the trees. 3 Visualising Tree Differences Our approach to visualising trees similarities and differences makes use of the fact that a tree with unordered branches can be drawn in many arrangements. In a phylogenetic tree, the order in which branches appear is usually less important than the structural relationships between nodes. In such cases, we can take advantage of this flexibility to draw a pair of trees to highlight both their similarities and differences. Our technique is to draw the pair of trees face-to-face, with the arrangements of each tree chosen to best emphasise the similarities and highlight the differences. For example, the trees in Figure 1 could be drawn as follows: This arrangement shows the greatest agreement subtree (,, ) and also how the differing node () connects in the two trees. In essence, it suggests that in one case diverged from the / line, whereas in the other it diverged from the line. Typical phylogenetic trees can often have 5 or more nodes, and since the number of possible arrangements of a fully resolved tree of size n is 2 n-1 it is usually impractical to manually determine the best arrangement. To help in the process we have considered several strategies for automatically arranging the trees. The minimum triplet difference (MTD) algorithm computes arrangements of two trees for which the difference, as measured by triplet arrangement pattern, is minimised. The maximum branch similarity (MBS) algorithm arranges one tree so that its branches have as many leaf nodes as possible in common with the corresponding branch in the other tree.

4 A B C Triplet Tree 1 pattern Tree 2 pattern (,, ) A J (,, ) A D (,, ) A D (,, ) G G Table 2: Triplet patterns for Tree 1 and Tree 2 D G J E H K F I L Tree 2 arrangement Tree 1 arrangement Figure 2: Labelled triplet arrangement patterns The all-but-n (ABn) algorithm attempts to arrange the common structures of the two trees so that the nodes that differ can be drawn in alignment. 3.1 Minimum Triplet Difference Nodes in a triplet can be labelled in 3 distinct ways, and there are 4 distinct arrangements for each labelling, making a total of 12 possible labelled triplet patterns, as shown in Figure 2. The nodes in the figure are labelled to suggest how the triplet pattern is assigned to a particular labelled tree. The label is assigned to the tree node with the lowest ordinal number (in some domain-specific ordering) of the three triplet nodes. Similarly, the label is assigned to the tree node with the highest ordinal number, and the label is assigned to the tree node with the intermediate ordinal number. For example, using an alphabetic ordering for the labels in the trees of Figure 1 and considering the triplet (,, ), label would map to (the label with the lowest ordering), to, and to (the highest ordering). Thus this triplet in Tree 1 would match pattern A, and the same triplet in Tree 2 would match pattern J. The triplet difference between two trees is computed by considering all triplets and counting the number of triplets for which the pattern in the two trees is different. For example, Table 2 lists the triplet patterns for all four of the triplets in the trees of Figure 1. Since three of the triplets have different patterns in the two trees, the triplet difference for these tree arrangements is 3. Table 3: Triplet difference matrix for all possible arrangements of Tree 1 and Tree 2 The minimum triplet difference (MTD) algorithm finds an arrangement for each tree that minimises the triplet difference. In principle, the algorithm considers each possible arrangement of each of the two trees, then choses the pair of arrangements for which the triplet difference is smallest. In general, there may be many pairs of arrangements with the same minimum triplet difference; MTD does not specify which such pair should be chosen. For example, Table 3 lists the triplet difference for each of the 8 possible arrangements of Tree 1 and Tree 2. In this case the minimum difference is 2, which is achieved by 8 pairs, of which one is as follows: 3.2 Maximum Branch Similarity The maximum branch similarity (MBS) algorithm arranges one tree so that the branches of each internal node have the largest number of leaf nodes in common with the corresponding branches of the equivalent node in the other tree. For example, consider the original arrangements of the trees in Figure 1. The set of leaf nodes comprised by the upper branch of the root node of Tree 1 is {}, and the

5 set comprised by the lower branch is {,, }. Similarly, the Tree 2 root node upper branch comprises {, } and lower branch {, }. Thus for this arrangement there are no nodes common to the upper branches, and only one () common to the lower branches, for a total common node count of 1. However, if the branches of the Tree 2 root node were exchanged, then the upper branches would have 1 common node () and the lower branches would have 2 common codes ( and ), for a total common node count of 3. Thus MBS indicates that the root node of Tree 2 should be flipped (its branches swapped), giving the following arrangement: The algorithm then recursively considers the upper and lower children of the original nodes, ultimately terminating at the leaf nodes. In this simple example, no further swaps occur since the upper branch of Tree 1 is already a leaf, and since flipping the lower branch of Tree 2 would not result in an increase in the number of common nodes (both alternatives have only 1 node in common). 3.3 All-But-n We have explored a class of algorithms, which we call All-But-n (ABn), that can arrange trees to maximise leaf node alignment in a face-to-face display where the GAS of the two trees is almost as large as the trees themselves (in other words, where the trees differ with respect to just a few nodes). The simplest situation (AB1) occurs for trees for which the GAS includes all but one node. In this case, the aim of the algorithm is to choose an arrangement for the GAS so that, when the differing node is re-inserted into the tree (which will be in a different position in the two trees), the differing nodes will be aligned. For example, the trees of Figure 1, which have a GAS that excludes the single node, could be drawn as follows. AB1 partitions the GAS into three components at the nearest common ancestor (NCA) of the points in the two original trees at which the different node is attached to the GAS. The component above the NCA (the outer tree) pays no further part in the algorithm. The algorithm proceeds by arranging the upper and lower inner branches of the NCA so that missing node attachment for one tree is on the lower boundary of the upper inner branch, while for the other tree it is on the upper boundary of the lower inner branch. Then, when the two trees are constructed around face-to-face copies of the GAS, the missing node insertion points will coincide. Since it is always possible to arrange a tree so that any one particular node is on the tree boundary, it is always possible to achieve this arrangement when the GAS is only one node short of the full trees. When more than one node must be pruned (and subsequently reinserted), the situation is more complex; sometimes full alignment can be achieved, but sometimes only partial alignment is possible. A full explanation of the ABn algorithm is beyond the scope of this paper. 4 A Visual Tree Comparison Tool We have implemented a prototype application for visualising pairs of phylogenetic trees and used it as a vehicle for developing and evaluating our ideas. The application is implemented in Java using the Swing components. Figure 3 shows the prototype tool displaying two 5-node trees. The program can read standard Nexus-format tree files (David et al. 1997) and display a selected pair of trees. It provides controls for specifying basic parameters of the tree display, including the separation between branches and the depth of each node. The information display area at the bottom of the window provides basic information about the trees and is used largely for debugging. Figure 3 shows the trees displayed in the raw arrangement specified in the Nexus file; in this example, that arrangement does not make it easy to compare the trees. However, the node connection display (between the two trees), which visually connects common leaf nodes in the two trees, provides some indication of similarities in the trees. Horizontal connection lines (coloured green in the application) indicate nodes whose vertical position is the same in the two trees. Clearly, if the two trees (or parts of the trees) are identical, then they can be drawn so that all nodes are aligned, in which case the connection display would consist entirely of parallel horizontal lines. In Figure 3, few nodes are aligned (the exception is a group of 3 towards the top of the display). Slanted connection lines (coloured red or yellow depending on whether the position of the node in the left tree is higher or lower than that in the right tree) indicate nodes that are not aligned. However, parallel slanted lines indicate groups of nodes whose relative positions are the same in the two trees, suggesting a similar structure for those groups in the two trees. Figure 3 shows several such groups.

6 Figure 3: Visualisation tool interface Figure 4: Collapsing interior nodes 4.1 Using The Application To rearrange the trees (in order to better compare them), the user can use a combination of manual interaction and automatic rearrangement. The palette on the left of Figure 3 includes tools for interactively modifying the tree appearance, including selecting tree nodes, collapsing selected branches, controlling the spacing between branches of a node, swapping the upper and lower branches of a node, and manually setting branch colours and line thicknesses. The collapse tool is used to temporary hide various parts of the tree, as shown in Figure 4. Collapsing nodes enhances visibility, especially for larger trees, because it enables the user to focus on specific parts of the tree while ignoring other parts. Collapsed nodes can then subsequently be expanded (and themselves arranged) once their containing structure has been dealt with. The insert gap and decrease gap tools are used to add additional space between branches in order to arrange a group of nodes so that they are located at the same level in both of the trees. The flip tool is used to swap the positions of the branches of a given interior node, which allows manual manipulation of the tree arrangement and may provide a simpler view of the tree structures. The visualisation tool currently implements the MTD and MBS automatic rearrangement algorithms, but not the ABn algorithm. To apply the algorithms, the user selects a branch (or perhaps the entire tree) in both left and right trees, then invokes the desired algorithm. The application computes the new arrangements, then redraws the trees with the selected nodes rearranged. 4.2 Evaluation Informal evaluation of our prototype visualisation tool has shown that a combination of automatic rearrangement and manual rearrangement is often effective in rapidly generating an arrangement that facilitates tree comparison, even for quite large trees. For example, Figure 5 shows an arrangement of the trees in Figure 3 for which most nodes are aligned. The arrangement was achieved by a combination of MBS (applied to the whole trees to align high-level structure), MTD (to sort out the tangles indicated by groups of nearly parallel connecting lines), manual node flipping (to fine-tune a few branches), and manual gap insertion (to move relatively aligned groups into absolute alignment).

7 Figure 5: An arrangement with greater alignment Note, however, that alignment of nodes does not necessarily indicate commonality of structure, although it does make it much easier to see such commonality. Figure 6 shows the same arrangement as does Figure 5, but with common leaf-level branches highlighted in colour. The colouring algorithm finds nodes that have the same siblings in both trees, then recursively examines their parents. Note that not all aligned nodes have common structure (although most do) and that not all nodes with common structure are aligned (although most are). Our current investigations suggest that the combination of alignment (to simplify the display) and colouring (to identify common structures) appears promising as a way to understand the two trees. We are working with our bioinformaticist colleagues to validate and further develop our ideas and to determine if interactive visusalisation is a viable technique for data of this kind. 5 Conclusion Information visualisation can play a major role in the analysis of phylogenetic data by allowing geneticists to visually compare and therefore better understand their data. We have developed and are in the process of evaluating a prototype tool that domain specialists that deal with phylogenies can use to help understand the data that they confront. Although we have not yet done so, we believe that our ideas will also be of value in other domains where similarly structured data is used, and where comparisons are key in understanding the implications of that data. Acknowledgements We gratefully acknowledge the contribution of Rejmond Sejic, who built an early version of the prototype tool and implemented the MTD algorithm as part of his Honours project (Sejic 24). Thank you also to our School of Biological Sciences colleagues Dr Cathy Abbott and Assoc. Prof. Mike Schwarz for their valuable insights and bioinformatics expertise. References Carrizo, S. F. (24): Phylogenetic Trees: An Information Visualisation Perspective. In Proc. 2nd Asia-Pacific Bioinformatics Conference (APBC24), Dunedin, New Zealand, Australian Computer Society, Inc. Klinger, J. and Amenta, N. (22): Case Study: Visualizing Sets of Evolutionary Trees. In Proc. IEEE Symposium on Information Visualization, Boston, Massachusetts, USA. Roderic, D. M. (1993): Component 2. User Guide, ml (last accessed 8/8/25). Byrant, D. (1997): Building Trees, Hunting for Trees and Comparing Trees. Ph.D. Thesis, University of Canterbury. Sejic, R. (24): Visual Comparison of Phylogenetic Trees. Honours Thesis, Flinders University of South Australia. Gibas, C. and Jambeck, P. (21): Bioinformatics Computer Skills. O Reilly, USA.

8 Figure 6: The final presentation David, R. M., David, L. S., and Wayne, P. M. (1997): NEXUS: An Extensible File Format for Systematic Information. Systematic Biology, 46(4):59, 62. Munzner, T., Guimbretiere, F., Tasiran, S., Zhang, L. and Zhou, Y. (23): TreeJuxtaposer: Scalable Tree Comparison Using FocusContext with Guaranteed Visibility. In Proc. SINGGRAPH 23. Thorup, M and Farach, M. (1994): Fast Comparison of Evolutionary Trees. In Proc. 5th Annual ACM_SIAM Symposium on Discrete Algorithms. Maddison, W. P. and Maddison, D. R. (25): Mesquite: a modular system for evolutionary analysis. Version Felsenstein, J., Sawyer, S., and Kochin, R (1982): An efficient method for matching nucleic acid sequences. Nucleic Acids Research 1(1): Farris J. S. (1983): The logical basis of phylogenetic analysis. In Advances in Cladistics, Platnick N.I. & Funk V.A., eds, pp Columbia Uni. Press, New York. Robinson, D. F. (1971): Comparison of labeled trees with valency three, J. Combin. Theory 11: Felsenstein, J. (accessed 27/1/25): PHYLIP web site.

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D 7.91 Lecture #5 Database Searching & Molecular Phylogenetics Michael Yaffe B C D B C D (((,B)C)D) Outline Distance Matrix Methods Neighbor-Joining Method and Related Neighbor Methods Maximum Likelihood

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

A Colour-Filling Approach For Visualising Trait Evolution With Phylogenies

A Colour-Filling Approach For Visualising Trait Evolution With Phylogenies A Colour-Filling Approach For Visualising Trait Evolution With Phylogenies Savrina F. Carrizo School of Information Technologies University of Sydney New South Wales, Australia, 2006 scarrizo@it.usyd.edu.au

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Finding the best tree by heuristic search

Finding the best tree by heuristic search Chapter 4 Finding the best tree by heuristic search If we cannot find the best trees by examining all possible trees, we could imagine searching in the space of possible trees. In this chapter we will

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Parsimony via Consensus

Parsimony via Consensus Syst. Biol. 57(2):251 256, 2008 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150802040597 Parsimony via Consensus TREVOR C. BRUEN 1 AND DAVID

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas. Solving the Maximum Agreement Subtree and Maximum Compatible Tree problems on bounded degree trees LIRMM, Montpellier France 4th July 2006 Introduction The Mast and Mct problems: given a set of evolutionary

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Copyright 2000 N. AYDIN. All rights reserved. 1

Copyright 2000 N. AYDIN. All rights reserved. 1 Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS. !! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory

More information

Walks in Phylogenetic Treespace

Walks in Phylogenetic Treespace Walks in Phylogenetic Treespace lan Joseph aceres Samantha aley John ejesus Michael Hintze iquan Moore Katherine St. John bstract We prove that the spaces of unrooted phylogenetic trees are Hamiltonian

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

Consensus methods. Strict consensus methods

Consensus methods. Strict consensus methods Consensus methods A consensus tree is a summary of the agreement among a set of fundamental trees There are many consensus methods that differ in: 1. the kind of agreement 2. the level of agreement Consensus

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B Steve Thompson: stthompson@valdosta.edu http://www.bioinfo4u.net 1 ʻTree of Life,ʼ ʻprimitive,ʼ ʻprogressʼ

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Chapter 1: Logic systems

Chapter 1: Logic systems Chapter 1: Logic systems 1: Logic gates Learning Objectives: At the end of this topic you should be able to: identify the symbols and truth tables for the following logic gates: NOT AND NAND OR NOR XOR

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

The practice of naming and classifying organisms is called taxonomy.

The practice of naming and classifying organisms is called taxonomy. Chapter 18 Key Idea: Biologists use taxonomic systems to organize their knowledge of organisms. These systems attempt to provide consistent ways to name and categorize organisms. The practice of naming

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :

More information

1 Basic Definitions. 2 Proof By Contradiction. 3 Exchange Argument

1 Basic Definitions. 2 Proof By Contradiction. 3 Exchange Argument 1 Basic Definitions A Problem is a relation from input to acceptable output. For example, INPUT: A list of integers x 1,..., x n OUTPUT: One of the three smallest numbers in the list An algorithm A solves

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM MENG ZHANG College of Computer Science and Technology, Jilin University, China Email: zhangmeng@jlueducn WILLIAM ARNDT AND JIJUN TANG Dept of Computer Science

More information

I. Short Answer Questions DO ALL QUESTIONS

I. Short Answer Questions DO ALL QUESTIONS EVOLUTION 313 FINAL EXAM Part 1 Saturday, 7 May 2005 page 1 I. Short Answer Questions DO ALL QUESTIONS SAQ #1. Please state and BRIEFLY explain the major objectives of this course in evolution. Recall

More information

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Pattern Popularity in 132-Avoiding Permutations

Pattern Popularity in 132-Avoiding Permutations Pattern Popularity in 132-Avoiding Permutations The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Rudolph,

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Outline. Classification of Living Things

Outline. Classification of Living Things Outline Classification of Living Things Chapter 20 Mader: Biology 8th Ed. Taxonomy Binomial System Species Identification Classification Categories Phylogenetic Trees Tracing Phylogeny Cladistic Systematics

More information

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection Mark T. Holder and Jordan M. Koch Department of Ecology and Evolutionary Biology, University of

More information

Isolating - A New Resampling Method for Gene Order Data

Isolating - A New Resampling Method for Gene Order Data Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence

More information

Workshop: Biosystematics

Workshop: Biosystematics Workshop: Biosystematics by Julian Lee (revised by D. Krempels) Biosystematics (sometimes called simply "systematics") is that biological sub-discipline that is concerned with the theory and practice of

More information

Introduction to Bioinformatics Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dr. rer. nat. Gong Jing Cancer Research Center Medicine School of Shandong University 2012.11.09 1 Chapter 4 Phylogenetic Tree 2 Phylogeny Evidence from morphological ( 形态学的 ), biochemical, and gene sequence

More information

Introduction to characters and parsimony analysis

Introduction to characters and parsimony analysis Introduction to characters and parsimony analysis Genetic Relationships Genetic relationships exist between individuals within populations These include ancestordescendent relationships and more indirect

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012 University of California, Berkeley Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley B.D. Mishler Feb. 7, 2012. Morphological data IV -- ontogeny & structure of plants The last frontier

More information

Phylogenetic methods in molecular systematics

Phylogenetic methods in molecular systematics Phylogenetic methods in molecular systematics Niklas Wahlberg Stockholm University Acknowledgement Many of the slides in this lecture series modified from slides by others www.dbbm.fiocruz.br/james/lectures.html

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS PETER J. HUMPHRIES AND CHARLES SEMPLE Abstract. For two rooted phylogenetic trees T and T, the rooted subtree prune and regraft distance

More information