Phylogenetic Inference: Building Trees

Size: px
Start display at page:

Download "Phylogenetic Inference: Building Trees"

Transcription

1 Phylogenetic Inference: Building rees Philippe Lemey and Marc. Suchard Rega Institute Department of Microbiology and Immunology K.U. Leuven, Belgium, and Departments of Biomathematics and Human Genetics David Geffen School of Medicine at UL Department of Biostatistics UL School of Public Health SISMID p.1 Intra-Host Viral Evolution Retroviruses (and HBV) exist as a quasi-species within infected patients: Shared substitutions may be insufficient to resolve intra-host phylogenies Improve resolution using joint model: 1195 env sequences from 9 HIV+ patients [taken from Rambaut et al. (2004)] Indel rates substitution rates Opportunity to detect intra-host recombination SISMID p.2

2 Reconstructing the ree of Life: re Humans Just Big Slime Molds? Branches lade Support >.99 Scale.10 hermotoga hloroflexus hermus Escheria Streptomyces Pyrococcus Euryarchaeota Nanoarchaeum rchaeoglobus Ferroplasma Methanosarcina Halobacterium Methanothermobacter Methanococcus eropyrum Sulfolobus Pyrobaculum Eubacteria Eocytes Giardia Eukaryotes Entamoeba Saccharum Euglena rypanasoma Dictyostelium Saccharomyces Homo B Uncertain D ertain hermotoga ILVV D GPMPQREHVLLRQVEVPYMIVFINKDM VD DPELIDL hloroflex ILVVS PD GPMPQREHILLRQVQVPIVVFLNKVDM MD DPELLEL Escheria ILVV D GPMPQREHILLGRQVGVPYIIVFLNKDM VD DEELLEL Pyrococcus VLVV D GVMPQKEHFLRLGIKHIIVINKMDM VNYNQKRFEE Halobacterium VLVV DD GVPQREHVFLSRLGIDELIVVNKMDV VDYDESKYNE Methanococus VLVVNV DDKSGIQPQREHVFLIRLGVRQLVVNKMD VNFSEDYNE eropyrum ILVVSRKGEFEGMSE G QREHLLLRMGIEQIIVVNKMDPDVNYDQKRYEF Sulfol obus ILVVSKKGEYEGMSE G QREHIILSKMGINQVIVINKMDLDPYDEKRFKE Giardia ILVVGQGEFEGISKD G QREHLNLGIKMIIVNKMDDGQVKYSKERYDE Euglena VLVIDSGGFEGISKD G QREHLLYLGVKQMIVNKFDDKVKYSQRYEE Dictyostelium VLVISPGEFEGIKN G QREHLLYLGVKQMIVINKMDEKSNYSQRYDE Homo VLIVGVGEFEGISKN G QREHLLYLGVKQLIVGVNKMDSEPPYSQKRYEE hermotoga hloroflex Escheria Pyrococcus Halobacterium Methanococus eropyrum Sulfol obus Giardia Euglena Dictyostelium Homo GIDKDEVERGQVL PGSIKPHKRFKQIYVLKKEEGGRHPFKGYKPQFYIRDV GIERDVERGQVL PGSIKPHKKFEQVYVLKKEEGGRHPFFSGYRPQFYIRDV GIKREEIERGQVL KPGIKPHKFESEVYILSKDEGGRHPFFKGYRPQFYFRDV GVSKNDIKRGDVGHN PPVVRKDFKQIIVLNH PIVGYSPVLHHQV GIGKDDIRRGDVGPDD PPSV DFQQVVVMQH PSVIGYPVFHHQV GVGKKDIKRGDVLGHN PPV DFQIVVLQH PSVLDGYPVFHHQI GVSKSDIKRGDVGHLDK PPV EEFERIFVIWH PSIVGYPVIHVHSV GVEKKDVKRGDVGSVQN PPV DEFQVIVIWH PVGVGYPVLHVHSI GLVKDLKKGYVVGDVNDPPVG KSFQVIVMNH PKKIQPGYPVIDHHI NVSVKDIRRGYVSNKNDPKE DFQVIILNH PGQIGNGYPVLDHHI NVSVKEIKRGMVGDSKNDPPQE EKFVQVIVLNH PGQIHGYSPVLDHHI NVSVKDVRRGNVGDSKNDPPME GFQVIILNH PGQISGYPVLDHHI MP ree Partial u Plot ontentious issue among paleobiologists: Do rchaea (Euryarchaeota/Eocytes) form one or two domains? Weekly World News calls humans slime molds. SISMID p.3 he hicken or the (Small) Genome: Which ame First? Reptiles Mammals Birds log Genome Size Evolutionary History and Genome Sizes of Reptiles, Dinosaurs, Birds and Mammals Issue: Bird genomes are markedly smaller than those from other vertebrates. Question: Did small genomes precede flight or co-evolve? SISMID p.4

3 Maximum Parsimony (MP) Most often used best, not even statistically consistent, but fast, fast, fast...if youknowthetree Key: Findtreewithminimal#of suspected substitutions (internal states are not observed, 0/1 model process) ounting minimum # of substitutions is easy Enumerating (searching through) all possible trees is hard Human GG himp GG Mouse Fly G G Site: long Molecular Sequence Sites are independent Pre Order raversal {,G} X {,} G X Post Order raversal SISMID p.5 Maximum Parsimony (MP) littlehistory: nthony Edwards/Luca avalli-sforza (1963,1964) Both students of R. Fisher Introduced both parsimony and likelihood methods (for continuous quantities, e.g. gene frequencies) in one paper amin and Sokal (1965) provide first program for molecular sequences Fitch and Margoliash (1967) provide efficient algorithm Pre Order raversal {,} {,G} X X G Post Order raversal SISMID p.6

4 Maximum Parsimony lgorithm procedure Fitch and Margoliash (1967) lgorithm cost 0 {Initialization} pointer k 2N 1 {at the root node} o obtain the set R k of possible states at node k {Recursion} if k is leaf then R k observed character for taxon k else ompute R i,r j for daughters i, j of k if R i R j 0then R k R i R j 7 else R k R i R 6 j +1 5 end if end if minimum cost is {ermination} Pre Order raversal {,} {,G} X X G Post Order raversal SISMID p.7 Searching for the MP ree omplexity: Find MP score is NP-complete 5 Branches B 3 Branches 5 Branches B B B D B D 7 Branches Find MP tree is NP-hard B B D 5 Branches Recall that # of N-taxon rooted trees is 3 5 2N 3 ttack exponential-order space Branch-and-Bound: Monotonic order: min PS 2 min PS 3... Bound if min PS k > best n-taxon PS found so far. SISMID p.8

5 Neighbor-Joining (Saitou and Nei, 1987) omputational algorithm: alignment single tree Site (data) Pairwise distances = 0 = 1 = Join (closest) neighbors 12* 1 2 Recompute distances (using 12*,3,4) dvantages: veryfast,greatfor1000sofsequences Disadvantages: nosite-to-siteratevariation,nonatural ways to compare trees/measure data support SISMID p.9 Neighbor-Joining aveat: Pairs i, j with min d ij are not necessarily nearest neighbors. E.g., d B =3<d =5 1 B Solution: Subtractofftheaveragedistancestoallother leaves via 1 D ij = d ij (r i + r j ), r i = d ik, L 2 k L where L is the current set of leaves. Proof in Studier and Keppler (1988). omputational: O(N 3 ) D SISMID p.10

6 Likelihood-based Methods (Felsenstein, 1973) Statistical technique: assumes an unknown tree and a stochastic model for character change along the tree Site (data) state G ontinuous ime Markov hain time Unknown tree/internal node states dvantages: site-to-siterate/treevariationiseasy,can formulate probability statements Disadvantages: must search tree-space slow Foundation of Bayesian Phylogenetics SISMID p.11 raditional Phylogenetic Reconstruction Human himp Mouse Fly Reconstruction Example Site: 1 Human himp G G G G G G long Molecular Sequence Mouse Fly Substitution: singleresidue replaces another Insertion/deletion: residues are inserted or deleted Statistical Model ssume: Homologous sites are iid and site patterns (e.g. dotted box) XY...Z Multinomial(p XY...Z ) where p XY...Z is determined by an unknown tree τ, branch lengths t and continuous-time Markov chain model (for residue substitution) given by infinitesimal rate matrix Q P(X Y in time t) =e tq SISMID p.12

7 M(Q) =ɛ Normal(µ, σ 2 ) of Phylogenetics ontinuous in elapsed time t, discrete in starting/ending state! Memory-less process in which the probability that state b replaces state a during (t, t + s) =sq ab + o(s) state G time Infinitesimal generator matrix Q has off-diagonal entries q ab and row sums =0 hink: ExponentialwaitingtimewithrateR a = b q ab until chain leaves a. hen the new state b is independently chosen with probabilities q ab /R a SISMID p.13 From Infinitesimal to Finite ime Let p ab (t) =the finite-time probability of the chain moving from state a at time 0 to state b at time t, thenmatrix P (t) ={p ab (t)} satisfies with solution d P (t) =P (t)q where P (0) = I dt P (t) =e tq = I + tq (tq)2 + = k=0 1 k! (tq)k as d dt etq = Qe tq = e tq Q for t real SISMID p.14

8 Example: wo-state Model onsider purines (R) pyrimidines (Y). Kolmogorov forward equation: R α Y p RY (t + s) =p RR (t)αs + p RY (t)(1 s)+o(s) yielding d dt p RY(t) =αp RR (t) p RY (t) Q = ( α α ) Solutions of P (t) = e tq have the form with eigenvalues 0 and (α + ) c + de (α+)t SISMID p.15 Standard Ms for Phylogenetics Jukes and antor (J69), π a = 1 4, κ 1 = κ 2 =1 Kimura (K80), π a = 1 4, κ 1 = κ 2 Hasegawa, Kishino and Yano (HKY85), κ 1 = κ 2 (most common) amura and Nei (N93), right κ 1 κ 2 General ime Reversible (GR) Note identifiability concern in e tq.ommonsolutionistofix 1d.f.suchthat q aa π a = 1 a Scaling: t =1 1 expected substitution per site π π G π G π SISMID p.16

9 Explicit Parameterization of N93 Nucleotides mutate according to a Markovian process Pr(X Y in time t) =e tq Nuc where Q Nuc is a 4x4 infinitesimal rate matrix and t is a branch length. π π π G κ 1 0 G κ 1 π G π π κ 1 π π π Q Nuc = π π G κ 2 π κ 2 π π G κ 2 π π 1, where κ 1, κ 2 are transition:transversion rate ratios and π is the stationary distribution of {,G,,}. controls the overall rate and can vary from site-to-site. SISMID p.17 Site-to-Site Rate Variation Variation occurs quite naturally and is also an important inference short range: codon phase (slow-slow-fast) long range: enzymatic active sites, protein folding, immunological pressures/selection ssume: infinitesimalratesforsitek are r k t q ab.variouspriorson r k with E(r k )=1.ImplicitlyBayesian Yang (1994) discretized Gamma distribution SISMID p.18

10 General ime Reversible M Let Q = RD π where R is symmetric and D π is a diagonal matrix composed of the stationary distribution π. Detailed balance π a q ab = π b q ba.balance+ irreducibility reversible Note Q is similar to R, asd 1/2 QD 1/2 = R Hence, Q must have real eigenvalues and real eigenvectors he properties speed up computation of the finite-time transition matrix P (t) =e tq SISMID p.19 alculating the Probability of a Single Site Pattern Y i Given the tree and unobserved internal node states, the probability is the product of the finite time mutation probabilities over all branches: t1 t 2 t Y t 5 4 X t 3 G L(Y i ) p G = Pr(Y,t 1 )Pr(X G,t 2 ) X Y Pr(X,t 3 )Pr(Y,t 4 )Pr(X Y,t 5 )π X (1) Number of sumants grow rapidly in N sum-product/peeling algorithm to distribute sums across the product SISMID p.20

11 Pruning lgorithm Felsenstein (1981) Let P (L k a) =likelihoodofleavesbelownodek given k is in state a. hen,recursively compute P (L k a) given P (L i b) and P (L j c) for daughters i, j of k: Set pointer k 2N 1 {the root, initialization} Node k ompute P (L k a) a as follows: {recursion} if k is a leaf node then if a is observed then P(L P (L k a) =1 i b) else P (L k a) =0 P(L end if j c) else ompute P (L i a) and P (L j a) a for daughters i, j of k {post-order traversal} P (L k a) = P P b c Pr(a b, t i)p (L i b) Pr(a c, t j )P (L j c) end if L(Y i ) P a P (L 2n 1 a)π a {termination} a b a c SISMID p.21 ML ree or MP ree? Reporting uncertainty on tree estimates: he Bootstrap Most common ssumes evolutionary events are reproducible. If I went back out to the field and recollected exchangeable data... Bayesian inference Returns the probability of a tree given the observed data and model Requires MM (e.g., MrBayes or BES) dvantages Does not rely on asymptotics (hypothesis testing) Naturally incorporates uncertainty in all parameters (including discrete quantities: trees, site-classifications, etc.) rguably faster algorithms Disadvantages Must specify (justifiable) prior distributions SISMID p.22

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31 Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31 genotype

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Molecular Evolution and Comparative Genomics

Molecular Evolution and Comparative Genomics Molecular Evolution and Comparative Genomics --- the phylogenetic HMM model 10-810, CMB lecture 5---Eric Xing Some important dates in history (billions of years ago) Origin of the universe 15 ±4 Formation

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Bayesian Analysis of Elapsed Times in Continuous-Time Markov Chains

Bayesian Analysis of Elapsed Times in Continuous-Time Markov Chains Bayesian Analysis of Elapsed Times in Continuous-Time Markov Chains Marco A. R. Ferreira 1, Marc A. Suchard 2,3,4 1 Department of Statistics, University of Missouri at Columbia, USA 2 Department of Biomathematics

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Modern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales

Modern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales Modern Phylogenetics n Introduction to Phylogenetics ret Larget larget@stat.wisc.edu epartments of otany and of Statistics University of Wisconsin Madison January 27, 2010 Phylogenies are usually estimated

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :

More information

Mutation models I: basic nucleotide sequence mutation models

Mutation models I: basic nucleotide sequence mutation models Mutation models I: basic nucleotide sequence mutation models Peter Beerli September 3, 009 Mutations are irreversible changes in the DNA. This changes may be introduced by chance, by chemical agents, or

More information

The Phylogenetic Handbook

The Phylogenetic Handbook The Phylogenetic Handbook A Practical Approach to DNA and Protein Phylogeny Edited by Marco Salemi University of California, Irvine and Katholieke Universiteit Leuven, Belgium and Anne-Mieke Vandamme Rega

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Phylogeny Tree Algorithms

Phylogeny Tree Algorithms Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

More information

Chapter 3: Phylogenetics

Chapter 3: Phylogenetics Chapter 3: Phylogenetics 3. Computing Phylogeny Prof. Yechiam Yemini (YY) Computer Science epartment Columbia niversity Overview Computing trees istance-based techniques Maximal Parsimony (MP) techniques

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Lecture 3: Markov chains.

Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

Week 5: Distance methods, DNA and protein models

Week 5: Distance methods, DNA and protein models Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 5 CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Non-Parametric Bayesian Population Dynamics Inference

Non-Parametric Bayesian Population Dynamics Inference Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U. Leuven, Belgium, and Departments of Biomathematics, Biostatistics

More information

Modeling Noise in Genetic Sequences

Modeling Noise in Genetic Sequences Modeling Noise in Genetic Sequences M. Radavičius 1 and T. Rekašius 2 1 Institute of Mathematics and Informatics, Vilnius, Lithuania 2 Vilnius Gediminas Technical University, Vilnius, Lithuania 1. Introduction:

More information

BMI/CS 776 Lecture 4. Colin Dewey

BMI/CS 776 Lecture 4. Colin Dewey BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2

More information

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A J Mol Evol (2000) 51:423 432 DOI: 10.1007/s002390010105 Springer-Verlag New York Inc. 2000 Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus

More information

Computational Genomics

Computational Genomics omputational Genomics Molecular Evolution: Phylogenetic trees Eric Xing Lecture, March, 2007 Reading: DTW boo, hap 2 DEKM boo, hap 7, Phylogeny In, Ernst Haecel coined the word phylogeny and presented

More information

What Is Conservation?

What Is Conservation? What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Phylogenetics and Darwin. An Introduction to Phylogenetics. Tree of Life. Darwin s Trees

Phylogenetics and Darwin. An Introduction to Phylogenetics. Tree of Life. Darwin s Trees Phylogenetics and Darwin An Introduction to Phylogenetics Bret Larget larget@stat.wisc.edu Departments of Botany and of Statistics University of Wisconsin Madison February 4, 2008 A phylogeny is a tree

More information

PROTEIN PHYLOGENETIC INFERENCE USING MAXIMUM LIKELIHOOD WITH A GENETIC ALGORITHM

PROTEIN PHYLOGENETIC INFERENCE USING MAXIMUM LIKELIHOOD WITH A GENETIC ALGORITHM 512 PROTEN PHYLOGENETC NFERENCE USNG MXMUM LKELHOOD WTH GENETC LGORTHM H. MTSUD Department of nformation and Computer Sciences, Faculty of Engineering Science, Osaka University 1-3 Machikaneyama, Toyonaka,

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Sequence modelling. Marco Saerens (UCL) Slides references

Sequence modelling. Marco Saerens (UCL) Slides references Sequence modelling Marco Saerens (UCL) Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004), Introduction to machine learning.

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification Multiple sequence alignment global local Evolutionary tree reconstruction Pairwise sequence alignment (global and local) Substitution matrices Gene Finding Protein structure prediction N structure prediction

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

The Causes and Consequences of Variation in. Evolutionary Processes Acting on DNA Sequences

The Causes and Consequences of Variation in. Evolutionary Processes Acting on DNA Sequences The Causes and Consequences of Variation in Evolutionary Processes Acting on DNA Sequences This dissertation is submitted for the degree of Doctor of Philosophy at the University of Cambridge Lee Nathan

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Markov Chains. Sarah Filippi Department of Statistics  TA: Luke Kelly Markov Chains Sarah Filippi Department of Statistics http://www.stats.ox.ac.uk/~filippi TA: Luke Kelly With grateful acknowledgements to Prof. Yee Whye Teh's slides from 2013 14. Schedule 09:30-10:30 Lecture:

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides hanks to Paul Lewis, Jeff horne, and Joe Felsenstein for the use of slides Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPM infers a tree from a distance

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

Phylogeny. November 7, 2017

Phylogeny. November 7, 2017 Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related

More information

Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating

More information

An Introduction to Bayesian Inference of Phylogeny

An Introduction to Bayesian Inference of Phylogeny n Introduction to Bayesian Inference of Phylogeny John P. Huelsenbeck, Bruce Rannala 2, and John P. Masly Department of Biology, University of Rochester, Rochester, NY 427, U.S.. 2 Department of Medical

More information

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Applications of hidden Markov models for comparative gene structure prediction

Applications of hidden Markov models for comparative gene structure prediction pplications of hidden Markov models for comparative gene structure prediction sger Hobolth 1 and Jens Ledet Jensen 2 1 Bioinformatics Research entre, University of arhus 2 Department of heoretical Statistics,

More information

Chapter 16: Reconstructing and Using Phylogenies

Chapter 16: Reconstructing and Using Phylogenies Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Phylogeny Jan 5, 2016

Phylogeny Jan 5, 2016 גנומיקה חישובית Computational Genomics Phylogeny Jan 5, 2016 Slides: Adi Akavia Nir Friedman s slides at HUJI (based on ALGMB 98) Anders Gorm Pedersen,Technical University of Denmark Sources: Joe Felsenstein

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

7.36/7.91 recitation CB Lecture #4

7.36/7.91 recitation CB Lecture #4 7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information