Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Similar documents
Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Dr. Amira A. AL-Hosary

Theory of Evolution. Charles Darwin

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Theory of Evolution Charles Darwin


EVOLUTIONARY DISTANCES

Phylogenetics: Parsimony

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Tree Reconstruction

Phylogeny Tree Algorithms

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Evolutionary Tree Analysis. Overview

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Algorithms in Bioinformatics

What is Phylogenetics

Phylogenetic trees 07/10/13

Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Introduction to characters and parsimony analysis

BINF6201/8201. Molecular phylogenetic methods

Phylogenetics: Building Phylogenetic Trees

Phylogenetic inference

Consistency Index (CI)

C.DARWIN ( )

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogeny. November 7, 2017

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

8/23/2014. Phylogeny and the Tree of Life

Building Phylogenetic Trees UPGMA & NJ

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

A (short) introduction to phylogenetics

Phylogenetic analyses. Kirsi Kostamo

Multiple Sequence Alignment. Sequences

Lecture 11 Friday, October 21, 2011

Letter to the Editor. Department of Biology, Arizona State University

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Molecular Evolution and Phylogenetic Tree Reconstruction

Intraspecific gene genealogies: trees grafting into networks

Inferring Molecular Phylogeny

Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Basic Terminology. Looking at Trees. Basic Terminology.

Phylogenetics. BIOL 7711 Computational Bioscience

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Quantifying sequence similarity

Midterm Exam #1. MB 451 Microbial Diversity. Honor pledge: I have neither given nor received unauthorized aid on this test.

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

MOLECULAR EVOLUTION AND PHYLOGENETICS SERGEI L KOSAKOVSKY POND CSE/BIMM/BENG 181 MAY 27, 2011

Phylogenetic methods in molecular systematics

Classification, Phylogeny yand Evolutionary History

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Phylogenetic inference: from sequences to trees

How to read and make phylogenetic trees Zuzana Starostová

Cladistics and Bioinformatics Questions 2013

Phylogeny: building the tree of life

Reading for Lecture 13 Release v10

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Is the equal branch length model a parsimony model?

Classification and Phylogeny

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Lecture 6 Phylogenetic Inference

Evolutionary Models. Evolutionary Models

molecular evolution and phylogenetics

Estimating Evolutionary Trees. Phylogenetic Methods

Classification and Phylogeny

What Is Conservation?

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

! A species tree aims at representing the evolutionary relationships between species. ! Species trees and gene trees are generally related...

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

A Phylogenetic Network Construction due to Constrained Recombination

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Phylogenetics in the Age of Genomics: Prospects and Challenges

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Reconstructing Evolutionary Trees. Chapter 14

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Phylogenetic Analysis

ELE4120 Bioinformatics Tutorial 8

Understanding relationship between homologous sequences

Supplementary Materials for

Phylogenetic Analysis

Transcription:

ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony

(,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation), the comma separates left and right lineages. (,(,(,))) = Parenthesis notation can contain sequence labels too.

Evolutionary time ladogram Phylogram Ultrametric tree 3 6 5 no meaning genetic change time (:5,(:,(:,:6):):3) parenthesis notation can have both labels and distances.

istance metrics MERI ISNES between any two or three taxa (a, b, and c) have the following properties: Property : d (a, b) 0 Non-negativity Property 2: d (a, b) = d (b, a) Symmetry Property 3: d (a, b) = 0 if and only if a = b istinctness Property 4: d (a, c) d (a, b) + d (b, c) riangle inequality a 9 6 b 5 c triangle inequality

ULRMERI ISNES...must satisfy the previous four conditions, plus: Property 5 istance metrics he distances from any branch point to the taxa in the clade defined by that branch point are equal. 2 4 2 2 a b c If distances are ultrametric, then the sequences are evolving in a perfectly clock-like manner. So any two sequences always have the same distance to their common ancestor.

istance metrics dditivity Property 6: Example: if (a,b) are nearest neighbors, d (a, b) + d (c, d) maximum [d (a, c) + d (b, d), d (a, d) + d (b, c)] For distances to fit into an evolutionary tree, they must be additive. Estimated distances often fall short of these criteria, and thus can fail to produce correct evolutionary trees. d (a, b) d (c, d) lineage that goes backwards in time violates additivity.

What s wrong with these distances? 0 3 5 7 3 0 4 5 0 9 7 4 9 0

What s wrong with this tree? 2 6 3

id the Florida entist infect his patients with HIV? Phylogenetic tree of HIV sequences from the ENIS, his Patients, & Local HIV-infected People: ENIS Patient Patient Patient G Patient Patient E Patient ENIS Local control 2 Local control 3 Patient F Local control 9 Local control 35 Local control 3 Patient Yes: he HIV sequences from these patients fall within the clade of HIV sequences found in the dentist. No No From Ou et al. (992) and Page & Holmes (998)

haracter-based versus distance-based methods for tree building haracter-based methods: Use the aligned sequences directly during tree inference. axa Species Species Species Species Species E haracters GGGG GG GGGG GGGGGG GGGG istance-based methods: ransform the sequence data into pairwise distances, and then use the matrix during tree building, ignoring characters. E Species ---- 0.20 0.50 0.45 0.40 Species 0.23 ---- 0.40 0.55 0.50 Species 0.87 0.59 ---- 0.5 0.40 Species 0.73.2 0.7 ---- 0.25 Species E 0.59 0.89 0.6 0.3 ----

alculating distances Uncorrected p-distance: count the changes, divide by the length. Species Species Species Species Species E GGGG GG GGGG GGGGGG GGGG op: uncorrected p-distance, ottom: Jukes-antor distance E Species ---- 0.20 0.50 0.45 0.40 Species 0.23 ---- 0.40 0.55 0.50 Species 0.87 0.59 ---- 0.5 0.40 Species 0.73.2 0.7 ---- 0.25 Species E 0.59 0.89 0.6 0.3 ---- Jukes-antor correction: K(,) = -3/4 ln [ - 4/3 (,)] (,) = 4/20

Homoplasy Independent evolution of the same character. () onvergent events (in either related on unrelated entities), (2) Parallel events (in related entities) (3) Reversals (in related entities) G G G G G G G G () (2) (3) he Jukes-antor correction assumes homoplasy occurs at the rate predicted by random mutations.

Neighbor joining: a distance-based method hoose the closest neighbors. dd a node between them. hoose the next closest, ad so on. E Species ---- 0.20 0.50 0.45 0.40 Species 0.23 ---- 0.40 0.55 0.50 Species 0.87 0.59 ---- 0.5 0.40 Species 0.73.2 0.7 ---- 0.25 Species E 0.59 0.89 0.6 0.3 ---- E

Neighbor joining: phylogram Finally, adjust the branch lengths to fit the distances, if possible! E Species ---- 0.20 0.50 0.45 0.40 Species 0.23 ---- 0.40 0.55 0.50 Species 0.87 0.59 ---- 0.5 0.40 Species 0.73.2 0.7 ---- 0.25 Species E 0.59 0.89 0.6 0.3 ---- 0.0 0.0 0.0 0.0 0.05 0.5 E

Fitch-Margoliash algorithm for calculating the branch lengths. Find the most closely-related pair of sequences, and 2. alculate the average distance from to all other sequences, then from to all other sequences. x x x 3. djust the position of the common ancestor node for and so that the difference between the averages is equal to the difference between the and branch lengths, while the sum of the branch lengths is still equal to d(,). d()-d() = (d(,)+d(,))/2 - (d(,)+d(,))/2 NOE: the difference between the averages may be greater than (,), making step 3 impossible.

In class: create a rooted phylogram with 4 taxa GGGGGG GGGGGGG GGGGGG GGGGGG.5.3.3.25.45.45 K(,) = -3/4 ln [ - 4/3 pdist(,)] pdist irections:.make a distance matrix. (p-distance, then convert to J- distance) 2.Use Neighbor-joining to make a tree. 3.djust branch lengths using Fitch-Margoliash. 4.hoose the root using the Midpoint method.

Which method do I use? Sequence similarity strong weak very weak Method to use distance parsimony maximum likelihood

Maximum parsimony -- it's character-building Optimality criterion: he most-parsimonious tree is the one that requires the fewest number of evolutionary events (e.g., nucleotide substitutions, amino acid replacements) to explain the sequences. E GGGG GG GGGG GGGGGG GGGG For this column, and this tree, one mutation event is required.

character-based tree-building For this other column, the same tree requires two mutation events. different tree would require only one. E GGGG GG GGGG GGGGGG GGGG

Finding the minimum number of mutations Given a tree and a set of taxa, one-letter each () choose optional characters for each ancestor. (2) Select the root character that minimizes the number of mutations by selecting each and propagating it through the tree. // / / / minimum 2 mutations minimum mutation

Ignore non-informative sites No mismatchs ---> 0 mutations, all trees mismatch --> mutation, all trees. all different --> all trees equivalent. 2

Max Unweighted Parsimony: rying all trees E......0......0 GGGG GGG GGGG GGGGGG GGGG OLS 0 0 2 0 0