Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Similar documents
Theory of Evolution. Charles Darwin

Theory of Evolution Charles Darwin

Constructing Evolutionary/Phylogenetic Trees

BINF6201/8201. Molecular phylogenetic methods

Algorithms in Bioinformatics

Evolutionary Tree Analysis. Overview

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.


Phylogeny Tree Algorithms

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Phylogeny: building the tree of life

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

A Phylogenetic Network Construction due to Constrained Recombination

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

EVOLUTIONARY DISTANCES

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Constructing Evolutionary/Phylogenetic Trees

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Phylogenetic trees 07/10/13

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogenetics. BIOL 7711 Computational Bioscience

Reading for Lecture 13 Release v10

Phylogenetic inference

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Lecture 11 Friday, October 21, 2011

Phylogenetic Tree Reconstruction

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Dr. Amira A. AL-Hosary

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Concepts and Methods in Molecular Divergence Time Estimation

8/23/2014. Phylogeny and the Tree of Life

Evolutionary Models. Evolutionary Models

Organizing Life s Diversity

A (short) introduction to phylogenetics

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Multiple Sequence Alignment. Sequences

Inferring Molecular Phylogeny

What is Phylogenetics

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

PHYLOGENY AND SYSTEMATICS

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Biology 211 (2) Week 1 KEY!

Molecular Evolution and Phylogenetic Tree Reconstruction

How should we organize the diversity of animal life?

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Chapter 27: Evolutionary Genetics

Gel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Macroevolution Part I: Phylogenies

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Sequence Alignment (chapter 6)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Pairwise & Multiple sequence alignments

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Phylogeny. November 7, 2017

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

How to read and make phylogenetic trees Zuzana Starostová

Chapter 3: Phylogenetics

Cladistics and Bioinformatics Questions 2013

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Skulls & Evolution. Procedure In this lab, groups at the same table will work together.

Introduction to Bioinformatics Introduction to Bioinformatics

Phylogenetic Trees. How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species?

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

In a way, organisms determine who belongs to their species by choosing with whom they will! MODERN EVOLUTIONARY CLASSIFICATION 18-2 MATE

Chapter 26 Phylogeny and the Tree of Life

Phylogenetics: Building Phylogenetic Trees

Classification and Phylogeny

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

PHYLOGENY & THE TREE OF LIFE

Phylogeny and Systematics

Introduction to characters and parsimony analysis

Chapter 26: Phylogeny and the Tree of Life

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Classification and Phylogeny

Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Chapter 26 Phylogeny and the Tree of Life

Transcription:

Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny 6. onclusions Why build evolutionary tree? Understand the lineage of different species. Have an organizing principle for sorting species into a taxonomy. Understand how various functions evolved. Understand forces and constraints on evolution. To do multiple alignment. Types of trees istance based = tree is constructed using distances between species/objects (number of mutations, time, other distance metric). haracter based = tree is constructed based on the acquisition or loss of traits. Not connected explicitly to any measure of distance. Roots Rooted trees show a single common ancestor for every object within the tree. Requires more information. Unrooted trees show objects as leafs, and internal nodes are some common ancestors, but there is insufficient information to tell whether or not a given internal node is a common ancestor of any leaves.

Page Ultrametric Matrix and Tree E 0 8 8 0 8 8 0 8 8 0 E 8 If M is nxn symmetric matrix, then ultrametric tree T for M has properties:. T contains n leaves each corresponding to a row of M. Each internal node of T is labeled by one entry from M and has children. The numbers along a path from root to leaf strictly decrease. M(i,j) is label of least common ancestor of i and j efinition of Ultrametric matrix M is an ultrametric distance matrix, iff for every three indices i, j and k there is a tie for the maximum of M(i,j), M(i,k) and M(j,k). That is, the maximum is not unique. i v u j k M(i,k) = M(j,k) = u M(i,j) = v Ultrametric Tree can be constructed in O(n ) Evolutionary trees as ultrametric trees M(i,j) can be considered the time since i and j diverged in evolution: and had last common ancestor 8 million years ago E 8 E 0 8 8 0 8 8 0 8 8 0 Thus, T becomes an evolutionary tree. Molecular lock Theory (Zuckerkandl & Pauling, early 960 s) For any given protein, accepted mutations in the amino acid sequence for the protein occur at a constant rate. ccepted = mutations that allow protein to function without death. Theory implies that the number of accepted mutations occurring in a time interval is proportional to the length of the interval. Molecular clock theory II. Rate of accepted mutations may be different for different proteins (depending on their tolerance for mutations).. ifferent parts of a protein may evolve at different rates.

Page Molecular clock theory III The clock simplifies task of collecting ultrametric data: If and differ by k accepted mutations, then roughly k/ mutations have occured since divergence. The k/ mutations are proportional to the times since divergence (by clock theory), and so full matrix of data can be collected. How can we collect this data?. Laboratory methods: mix single strands of N of related genes from different species, and measure how tightly they associate. The more complementary the N, the tighter the association. Used to study bird evolution, found nearly perfect ultrametric data. Thus, N clock ticks at same rate for all bird species... E 0 8 8 0 8 8 0 8 8 0 How can we collect this data?. Sequence analysis methods: Estimate number of mutations based on sequence comparisons, using a version of edit distance. Need to correct for multiple mutations, neutral mutations of N. E 0 8 8 0 8 8 0 8 8 0 Ultrametric Matrices Most data is not really ultrametric. If data are ultrametric, then strong evidence for a molecular clock within that data, and that evolutionary tree is likely to be correct. Un-ultrametric data sometimes can be fitted to closest ultrametric matrix nearby. dditive Trees Ultrametric are the best, but if data doesn t fit, can go to weaker test: additive matrix. Given M matrix, T is an additive tree if * T has at least n nodes, with n rows of M * for each (i,j) path from i to j has weight M(i,j) 0 7 9 0 6 8 0 6 0 dditive trees and evolution Ultrametric trees are additive, but additive trees are not necessarily ultrametric. (NOTE: If an additive tree has a node v such that all leaves have the same distance to v, then it is ultrametric.) dditive trees don t have a root, but just indicate branching patterns, with no implied direction.

Page When data is imperfect... Other distance methods:. Plain old cluster analysis--group species by similarity, form groups of related species. E.g. hierarchical cluster analysis (See Mount, UPGM, p 6).. Least squares fitting of an error function that is minimal when observed and estimated distances on tree match. (See Mount, NEIGHOR JOINING, p 60) haracter-based trees Very often, we don t have distance metrics between the species of interest. ut we do have a set of observable features. an use these features as the basis for building a tree. Rows = objects olumns = traits haracter-based trees = object has trait (character) 0 = object does not have trait Object has all traits on path from the root. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 haracter-based trees interpreted as evolutionary trees. Root of the tree represents an ancestral object that has none of the present m characters = [0 0 0... 0].. Each of the characters changes from zero to one exactly once and never changes back. Thus, each character labels one edge. Gives an evolutionary history by mutation event, but not time. E E Where do characters (traits) come from? Traditionally, morphological features like: has backbone has feathers loss of a certain bone in eye socket has a certain amino acid at position i walks on knuckles has certain nonfunctional N word in sequence whether a certain gap is present in a multiple alignment whether or not protein X regulates protein Y

Page Perfect phylogeny problem Given the n x m binary matrix M, determine whether there is a tree for M, build it. From merican Museum of Natural History http://www.amnh.org/exhibition/fossil_halls/cladistics.html Theorem: Matrix M has a phylogenetic tree iff every pair of columns i,j the traits of i and j are either disjoint or one contains the other. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O(nm) algorithm for solving perfect phylogeny problem. onsider each column of M as a binary numbers. Sort numbers in decreasing order, largest in column.. New matrix = M and each trait is column #.. For each row, construct the string of traits it has.. uild a keyword tree T for the n strings from step.. Test whether T is a perfect phylogeny for M. Keyword tree Rooted directed tree, edge labeled with character, branches with different labels, paths spell out keywords. an be used to solve dictionary problem--determine if a string is in the dictionary. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 Old trait labels New trait labels o t a t o s p c h o o l e t r y t t e r y i e n c e 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 Keyword tree for M Keywords: = = = = E = E Interpretation: branched from when acquired new trait (old ). and had already acquired trait new. Using original character labels 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 E

Page 6 Generalized Perfect Phylogeny character or trait is allowed to take on more than two states. E.g. olor of fur. Very Hard Problem = NP-complete. Maximum Parsimony Tree For d characters, find tree with exactly d mutations. No more than is needed. Very Hard Problem. Related to Steiner Trees. Maximum Likelihood Methods Have an underlying probabilistic model of the basic operators in evolution: mutation, insertion, deletion, translocation, etc... lso have a set of sequences. an assess the probability of the data, given the model. an compute the most likely model given the data, using ayes Rule. P(M ) = P( M) P(M)/P() Multiple lignment and Trees Progressive alignment methods do multiple alignment and evolutionary tree construction at the same time. Sequence alignment provides scores which can be interpreted as inversely related to distances in evolution. istances can be used to build trees. Trees can be used to give multiple alignments via common parents. http://www.amnh.org/exhibition/expedition/fossils/index.html Ribosomal Sequences: RP