Evolutionary Tree Analysis. Overview

Similar documents
CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

Phylogenetic trees 07/10/13


Phylogenetic Tree Reconstruction

BINF6201/8201. Molecular phylogenetic methods

Phylogeny: traditional and Bayesian approaches

Theory of Evolution Charles Darwin

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

EVOLUTIONARY DISTANCES

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

What is Phylogenetics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Constructing Evolutionary/Phylogenetic Trees

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Theory of Evolution. Charles Darwin

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Algorithms in Bioinformatics

Phylogeny: building the tree of life

Phylogenetic inference

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Molecular Evolution and Phylogenetic Tree Reconstruction

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics. BIOL 7711 Computational Bioscience

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Multiple Sequence Alignment. Sequences

Phylogenetic analyses. Kirsi Kostamo

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Phylogeny Tree Algorithms

Phylogenetics: Parsimony

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Phylogenetics: Building Phylogenetic Trees

Pattern Matching (Exact Matching) Overview

Phylogeny. November 7, 2017

Constructing Evolutionary/Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

A (short) introduction to phylogenetics

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

C.DARWIN ( )

Dr. Amira A. AL-Hosary

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Algorithms for Phylogenetic Reconstructions

Introduction to characters and parsimony analysis

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetic Networks, Trees, and Clusters

Lecture 10: Phylogeny

A Phylogenetic Network Construction due to Constrained Recombination

Evolutionary Trees. Evolutionary tree. To describe the evolutionary relationship among species A 3 A 2 A 4. R.C.T. Lee and Chin Lung Lu

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Phylogeny Jan 5, 2016

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Reading for Lecture 13 Release v10

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Cladistics and Bioinformatics Questions 2013

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

BMI/CS 776 Lecture 4. Colin Dewey

MOLECULAR EVOLUTION AND PHYLOGENETICS SERGEI L KOSAKOVSKY POND CSE/BIMM/BENG 181 MAY 27, 2011

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

8/23/2014. Phylogeny and the Tree of Life

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Chapter 3: Phylogenetics

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Gel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1

Is the equal branch length model a parsimony model?

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

molecular evolution and phylogenetics

Molecular Phylogenetics (Hannes Luz)

Consistency Index (CI)

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Module 13: Molecular Phylogenetics

Comparative Genomics II

Lecture 6 Phylogenetic Inference

! A species tree aims at representing the evolutionary relationships between species. ! Species trees and gene trees are generally related...

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

How to read and make phylogenetic trees Zuzana Starostová

Phylogeny and Molecular Evolution. Introduction

Phylogenetic inference: from sequences to trees

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Properties of normal phylogenetic networks

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

66 Bioinformatics I, WS 09-10, D. Huson, December 1, Evolutionary tree of organisms, Ernst Haeckel, 1866

Copyright 2000 N. AYDIN. All rights reserved. 1

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Evolutionary Models. Evolutionary Models

Jed Chou. April 13, 2015

Transcription:

CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based Evolutionary Tree Reconstruction 1

Phylogenetic Tree Phylogenetics The study of evolutionary relatedness among species Phylogenetic Tree (Evolutionary Tree) Tree-structure diagram showing the inferred evolutionary relationships between a set of objects The objects are called texa individual genes, or species when orthologous genes are used Each node represents each texon Each edge represents the evolutionary relationship between texa Types of Phylogenetic Trees (1) Rooted Tree vs. Unrooted Tree Rooted trees Root - the most ancient species External nodes (leaf nodes) - nodes with degree-1 - existing species Unrooted trees Internal nodes - nodes with degree > 1 - hypothetical ancestral species (before speciation events) 2

Types of Phylogenetic Trees (2) Cladogram vs. Additive Tree vs. Ultrametric Tree Cladogram - Defines tree topology only - No meaning on branch lengths Additive tree - Branch lengths are a measure of evolutionary divergence evolutionary distance - Weighted trees Ultrametric tree - The vertical axis is a time scale - Rooted trees Types of Phylogenetic Trees (3) Bifurcating vs. Multifurcating Bifurcating (or Dichotomous) - Each texon as an internal node diverges into two separate descendent texa - Fully resolved - What is the degree of internal nodes? - How many nodes a rooted bifurcating tree with N leaves has? - How many nodes an unrooted bifurcating tree with N leaves has? Multifurcating (or Polytomous) - Each texon diverges into more than two separate descendent texa - Partially resolved 3

Types of Phylogenetic Trees (4) Condense Tree Types of Phylogenetic Trees (5) Species Tree vs. Gene Tree Species tree - Evolutionary relationships between species Gene (or gene family) tree - Evolutionary relationship between homologous genes - Some branch points represent gene duplication events - Other branch points represent speciation events 4

Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based Evolutionary Tree Reconstruction Evolutionary Distance Evolutionary Path The path from the root to a leaf in a rooted tree Evolutionary Distance Sum of weights of the shortest path between two leaf nodes in a weighted tree Example Additive unrooted tree 2 3 4 12 13 16 13 14 17 12 12 13 5 1 6 5

Evolutionary Tree Reconstruction Distance Matrix Given n species (or genes), n n matrix D D ij = edit distance (inverse of sequence similarity) between two species (or genes) i and j Evolutionary Tree Reconstruction Constructs an evolutionary tree that best fits the distance matrix Finds an evolutionary tree such that the evolutionary distance between i and j in a tree T equals to the edit distance between them i.e., d ij (T) = D ij Formulation of Distance-based Tree Reconstruction Goal Reconstructing an evolutionary tree from a distance matrix Input An n n distance matrix D Output A weighted unrooted (or rooted) binary tree T with n leaf nodes, which best fits D, if D is additive 6

Additive / Non-additive Distance Matrix Additive Distance Matrix Matrix D is additive if there exists an evolutionary tree T such that d ij (T) = D ij Example?? Non-additive Distance Matrix Matrix D is non-additive otherwise Example?? Four Point Condition Four Point Condition Given 4 4 distance matrix D with the elements a, b, c, d, among three sums of (D ab +D cd ), (D ac +D bd ), and (D ad +D bc ), two are the same, and the other is smaller Example a c x y b d Theorem An n n matrix D is additive if and only if the four point condition holds for every 4 distinct elements, 1 a,b,c,d n 7

UPGMA Algorithm UPGMA Unweighted pair-group method using arithmetic averages Input Distance matrix D: n n matrix of evolutionary distance for n species Output An ultrametric (rooted) tree for n species Process (1) Merge two closest nodes, x and y, to create one ancestor node z (2) Draw the height of z by distance between x and y (3) Remove the rows and columns of x and y in D (4) Insert the row and column of z with average distance in D (5) Repeat (1)~(4) until reaches the root 3-Leaf Tree 3-Leaf Tree A basic component for unrooted, additive (weighted) tree for 3 species d ic + d jc = D ij d ic + d kc = D ik d jc + d kc = D jk d ic = (D ij + D ik D jk ) / 2 d jc = (D ij + D jk D ik ) / 2 d kc = (D ki + D kj D ij ) / 2 n-leaf Tree How many edges for n species? How many variables? How many equations? 8

Solving by Neighboring Leaves (1) Neighbors A pair of nodes that are separated by just one other node Input Distance matrix D: n n matrix of evolutionary distance for n species Output An unrooted, additive (weighted) tree for n species Process (1) Find two closest nodes, x and y, as neighbors (2) Calculate distance from x and y to their ancestor z by 3-leaf tree formula (3) Remove the rows and columns of x and y in D (4) Insert the row and column of z with distance by 3-leaf tree formula (5) Repeat (1)~(4) until completes an unrooted tree Solving by Neighboring Leaves (2) Example Distance matrix A B C D A 0 2 6 5 B 2 0 6 5 C 6 6 0 3 D 5 5 3 0 Problem The closest leaves are NOT necessarily neighbors 9

Solving by Degenerate Triples (1) Degenerate Triples A set of three distinct elements 1 i, j, k n where D ij + D jk = D ik The leaf node j in a degenerate triple i, j, k lies on the evolutionary path from i to k Process If distance matrix D has a degenerate triple i, j, k, then we remove j from D to reduce the size of the problem If distance matrix D does not have a degenerate triple i, j, k, then we create a degenerate triple in D by shortening all hanging edges in the tree Solving by Degenerate Triples (2) Shortening Hanging Edges All hanging edges are reduced by the same amount δ All pair-wise distances in the matrix are reduced by 2δ Example δ =1 δ =3 10

Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based Evolutionary Tree Reconstruction Background of Character-Based Approach Main Concept Find evolutionary history with the minimum number of character changes (or mutations) between species (i.e., ortholog genes) Consider mutation at each position of the sequences separately (single point mutations) Edge Weight Observed character differences resulted from the mutations Hamming distance between two species (i.e., two ortholog genes) 11

Parsimony Methods Parsimony Score Sum of the weights of all edges in the phylogenetic tree Examples higher parsimony score less parsimonious lower parsimony score more parsimonious Evolutionary Tree Reconstruction Constructs an evolutionary tree having the lowest parsimony score Formulation of Character-based Tree Reconstruction Goal Finding an evolutionary tree with n leaf nodes Input An n m alignment matrix D n = # species (sequences) m = # characters (length of each sequence) Output An evolutionary tree T with n leaf nodes, minimizing the parsimony score 12

Process of Character-based Tree Reconstruction Tasks Assigning n sequences to leaf nodes Determining sequences at internal nodes large parsimony problem small parsimony problem Process For each position, find the node labels (a character for each node) minimizing the parsimony score Formulation of Small Parsimony Problem Goal Finding the most parsimonious labeling of the internal nodes in an evolutionary tree Input A rooted tree T with n leaf nodes labeled by n strings of length-m Output Labels (strings) of internal nodes of T, minimizing the parsimony score 13

Fitch Algorithm (1) Fitch Algorithm (1) Assigns a set of characters to each node, traversing the tree from leaf nodes to root If two sets of characters from child nodes u and w of a node v overlap, assigns the common set of them to v If not, assigns the combined set of them to v if S u and S w overlap otherwise (2) Assigns labels to each node, traversing the tree from root to leaf nodes For the root, chooses one arbitrarily from its set of characters For all other nodes, if its parent s label is in its set of characters, assigns its parent s label Else, choose one arbitrarily from its set of characters Fitch Algorithm (2) Example A {A,C} {G} A G A C G G A C G G {A,C,G} A {A,C} {G} {A,C} {G} A C G G A C G G Parsimony score? 14

Unweighted vs. Weighted Parsimony Unweighted Parsimony Problem Evolutionary tree Scoring matrix A 0 1 1 1 T 1 0 1 1 G 1 1 0 1 C 1 1 1 0 Weighted Parsimony Problem Scoring matrix Evolutionary tree A 0 3 4 9 T 3 0 2 4 G 4 2 0 4 C 9 4 4 0 Formulation of Small Weighted Parsimony Problem Goal Finding the minimal weighted parsimony score labeling of the internal nodes in an evolutionary tree Extended version of the small parsimony problem Input A rooted tree T with n leaf nodes labeled by n strings of length-m having k distinct characters k k scoring matrix Output Labels (strings) of internal nodes T, minimizing the weighted parsimony score 15

Sankoff Algorithm (1) Sub-tree r v u w Sankoff Algorithm Calculating a parsimony score for every possible label at each node v s t (v) = the minimum parsimony score of the sub-tree rooted at v if v has the character t Scoring at each node based on the scores of its child nodes dynamic programming Sankoff Algorithm (2) Example A 0 3 4 9 T 3 0 2 4 G 4 2 0 4 C 9 4 4 0 0 A C T G 0 0 0 9 7 8 9 7 2 2 8 A C T G 14 9 10 15 T 0 0 T T 3 4 0 2 A C T G 16

Formulation of Large Parsimony Problem Goal Finding an evolutionary tree with n leaf nodes, having the minimal parsimony score Input An n m alignment matrix D n = # species (sequences) m = # characters (length of each sequence) Output An evolutionary tree with n leaf nodes labeled by n rows of length m and internal nodes labeled by strings, such that the parsimony score is minimized Exhaustive Search Algorithm Process (1) Enumerates all possible tree structures with n leaf nodes (2) Solves the small parsimony problem for each structure (3) Selects the best one Problem Number of all possible tree structures grows exponentially w.r.t. n 17

Greedy Algorithm Nearest Neighbor Interchange Algorithm (1) Starts with an arbitrary tree (2) Interchanges two neighbor trees if it provides the best improvement in parsimony score (3) Repeat (2) in each subtree Example Questions? Lecture Slides are found on the Course Website, web.ecs.baylor.edu/faculty/cho/5330 18