Molecular Evolution and Phylogenetic Tree Reconstruction

Similar documents
CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

Phylogenetic trees 07/10/13

Phylogeny: traditional and Bayesian approaches

EVOLUTIONARY DISTANCES


Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Phylogeny: building the tree of life

BINF6201/8201. Molecular phylogenetic methods

Evolutionary Tree Analysis. Overview

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Tree Reconstruction

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Theory of Evolution Charles Darwin

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Algorithms in Bioinformatics

Reconstruction of certain phylogenetic networks from their tree-average distances

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

Phylogenetic inference

Lecture 18 Molecular Evolution and Phylogenetics

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Concepts and Methods in Molecular Divergence Time Estimation

Constructing Evolutionary/Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Hierarchical Clustering

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Dr. Amira A. AL-Hosary

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Phylogenetics. BIOL 7711 Computational Bioscience

Week 5: Distance methods, DNA and protein models

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Using algebraic geometry for phylogenetic reconstruction

Lecture 10: Phylogeny

Phylogeny Tree Algorithms

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Tree-average distances on certain phylogenetic networks have their weights uniquely determined

Multiple Sequence Alignment. Sequences

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Phylogeny Jan 5, 2016

Reconstruire le passé biologique modèles, méthodes, performances, limites

What is Phylogenetics

Theory of Evolution. Charles Darwin

Reading for Lecture 13 Release v10

Constructing Evolutionary Trees

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Modeling Noise in Genetic Sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

A few logs suce to build (almost) all trees: Part II

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Phylogenetic Assumptions

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Variances of the Average Numbers of Nucleotide Substitutions Within and Between Populations

Introduction to Bioinformatics Introduction to Bioinformatics

Identity. "At least one dog has fleas" is translated by an existential quantifier"

Consistency Index (CI)

Evolutionary Models. Evolutionary Models

Understanding relationship between homologous sequences

A (short) introduction to phylogenetics

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

What Is Conservation?

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

3D Elasticity Theory

p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)

Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Chapter 3: Phylogenetics

BMI/CS 776 Lecture 4. Colin Dewey

Molecular Phylogenetics (Hannes Luz)

Molecular Evolution, course # Final Exam, May 3, 2006

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Evolutionary Analysis of Viral Genomes

66 Bioinformatics I, WS 09-10, D. Huson, December 1, Evolutionary tree of organisms, Ernst Haeckel, 1866

Supplementary Information

Phylogenetic inference: from sequences to trees

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

The Generalized Neighbor Joining method

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Transcription:

1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5

Orthology, Paralogy, Inparalogs, Outparalogs

Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length represents evolution time AKA genetic distance Not necessarily chronological time

Inferring Phylogenetic Trees Trees can be inferred by several criteria: Morphology of the organisms Can lead to mistakes Sequence comparison Example: Mouse: ACAGTGACGCCCCAAACGT Rat: ACAGTGACGCTACAAACGT Baboon: CCTGTGACGTAACAAACGA Chimp: CCTGTGACGTAGCAAACGA Human: CCTGTGACGTAGCAAACGA

Distance Between Two Sequences Basic principle: Distance proportional to degree of independent sequence evolution Given sequences x i, x j, d ij = distance between the two sequences One possible definition: d ij = fraction f of sites u where x i [u] x j [u] Better scores are derived by modeling evolution as a continuous change process

Molecular Evolution Modeling sequence substitution: Consider what happens at a position for time Δt, P(t) = vector of probabilities of {A,C,G,T} at time t µ AC = rate of transition from A to C per unit time µ A = µ AC + µ AG + µ AT rate of transition out of A p A (t+δt) = p A (t) p A (t) µ A Δt + p C (t) µ CA Δt + p G (t) µ GA Δt + p T (t) µ TA Δt

Molecular Evolution In matrix/vector notation, we get P(t+Δt) = P(t) + Q P(t) Δt where Q is the substitution rate matrix

Molecular Evolution This is a differential equation: P (t) = Q P(t) Q => prob. distribution over {A,C,G,T} at each position, stationary (equilibrium) frequencies π A, π C, π G, π T Each Q is an evolutionary model Some work better than others

Evolutionary Models Jukes-Cantor Kimura Felsenstein HKY

Estimating Distances Solve the differential equation and compute expected evolutionary time given sequences P (t) = Q P(t) Jukes-Cantor: Let P AA (t) = P CC (t) = P CC (t) = P CC (t) = r P AC (t) = = P TG (t) = s Then, r (t) = - ¾ r(t) µ + ¾ s(t) µ s (t) = - ¼ s(t) µ + ¼ r(t) µ Which is satisfied by r(t) = ¼ (1 + 3e -µt ) s(t) = ¼ (1 - e -µt )

Estimating Distances Solve the differential equation and compute expected evolutionary time given sequences P (t) = Q P(t) Jukes-Cantor:

Estimating Distances Let p = probability a base is different between two sequences, Solve to find t Jukes-Cantor r(t) = 1 p = ¼ (1 + 3e -µt ) p = ¾ ¾ e -µt Therefore, ¾ p = ¾ e -µt 1 4p/3 = e -µt µt = - ln(1 4p/3) Letting d = ¾ µt, denoting substitutions per site,

Estimating Distances d: Branch length in terms of substitutions per site Jukes-Cantor Kimura

Simple method for building tree: UPGMA UPGMA (unweighted pair group method using arithmetic averages) Or the Average Linkage Method Given two disjoint clusters C i, C j of sequences, 1 d ij = Σ {p Ci, q Cj} d pq C i C j Claim that if C k = C i C j, then distance to another cluster C l is: d il C i + d jl C j d kl = C i + C j

Algorithm: Average Linkage Initialization: Assign each x i into its own cluster C i Define one leaf per sequence, height 0 1 4 Iteration: Find two clusters C i, C j s.t. d ij is min Let C k = C i C j Define node connecting C i, C j, and place it at height d ij /2 Delete C i, C j 3 2 5 Termination: When two clusters i, j remain, place root at height d ij /2 1 4 2 3 5

Average Linkage Example v w x y z v 0 6 8 8 8 w 0 8 8 8 x 0 4 4 y 0 2 z 0 v w x yz v w xyz v 0 6 8 w 0 8 xyz 0 vw xyz vw 0 8 xyz 0 4 v 0 6 8 8 w 0 8 8 x 0 4 yz 0 3 v w x y 2 z 1

Ultrametric Distances and Molecular Clock Definition: A distance function d(.,.) is ultrametric if for any three distances d ij d ik d ij, it is true that The Molecular Clock: d ij d ik = d jk The evolutionary distance between species x and y is 2 the Earth time to reach the nearest common ancestor That is, the molecular clock has constant rate in all species years 1 4 2 3 5 The molecular clock results in ultrametric distances

Ultrametric Distances & Average Linkage 1 4 2 3 5 Average Linkage is guaranteed to reconstruct correctly a binary tree with ultrametric distances Proof: Exercise

Weakness of Average Linkage Molecular clock: all species evolve at the same rate (Earth time) However, certain species (e.g., mouse, rat) evolve much faster Example where UPGMA messes up: Correct tree AL tree 2 3 1 4 1 4 3 2

Additive Distances 1 3 8 d 1,4 13 12 4 7 9 11 5 2 10 6 Given a tree, a distance measure is additive if the distance between any pair of leaves is the sum of lengths of edges connecting them Given a tree T & additive distances d ij, can uniquely reconstruct edge lengths: Find two neighboring leaves i, j, with common parent k Place parent node k at distance d km = ½ (d im + d jm d ij ) from any node m i, j

Additive Distances x z y w For any four leaves x, y, z, w, consider the three sums d(x, y) + d(z, w) d(x, z) + d(y, w) d(x, w) + d(y, z) One of them is smaller than the other two, which are equal d(x, y) + d(z, w) < d(x, z) + d(y, w) = d(x, w) + d(y, z)

Reconstructing Additive Distances Given T x D y 4 5 T v w x y z v 0 10 17 16 16 z 7 3 3 4 w w 0 15 14 14 x 0 9 15 y 0 14 z 0 If we know T and D, but do not know the length of each leaf, we can reconstruct those lengths 6 v

Reconstructing Additive Distances Given T x D y T v w x y z v 0 10 17 16 16 z w w 0 15 14 14 x 0 9 15 v y 0 14 z 0

Reconstructing Additive Distances Given T D v w x y z v 0 10 17 16 16 w 0 15 14 14 x 0 9 15 y 0 14 z 0 x y z a T w D 1 a x y z a 0 11 10 10 x 0 9 15 y 0 14 z 0 d ax = ½ (d vx + d wx d vw ) d ay = ½ (d vy + d wy d vw ) d az = ½ (d vz + d wz d vw ) v

Reconstructing Additive Distances Given T D 1 x a x y z a 0 11 10 10 y 5 x 0 9 15 4 b y 0 14 3 z z 0 3 7 c a 4 T w D 2 a b z a 0 6 10 b 0 10 z 0 D 3 a c a 0 3 c 0 6 d(a, c) = 3 d(b, c) = d(a, b) d(a, c) = 3 d(c, z) = d(a, z) d(a, c) = 7 d(b, x) = d(a, x) d(a, b) = 5 d(b, y) = d(a, y) d(a, b) = 4 d(a, w) = d(z, w) d(a, z) = 4 d(a, v) = d(z, v) d(a, z) = 6 Correct!!! v

Neighbor-Joining Guaranteed to produce the correct tree if distance is additive May produce a good tree even when distance is not additive 1 3 Step 1: Finding neighboring leaves Define D ij = (N 2) d ij k i d ik k j d jk 0.1 0.1 0.1 0.4 0.4 2 4 Claim: The above magic trick ensures that i, j are neighbors if D ij is minimal

Neighbor-Joining D ij = (N 2) d ij k i d ik k j d jk 1 3 0.1 0.1 0.1 0.4 0.4 2 4

Neighbor-Joining D ij = (N 2) d ij k i d ik k j d jk 1 3 0.1 0.1 0.1 0.4 0.4 - All leaf edges appear negatively exactly twice 2 4 - All other edges appear negatively once for every path from each of the two leaves i, j, to leaves k i, j

Some Trees

Some Trees

Some Trees

Some Trees

Some Trees

Some Trees