Phylogenetic Inference: Building Trees
|
|
- Dwain Francis
- 5 years ago
- Views:
Transcription
1 Phylogenetic Inference: Building rees Philippe Lemey and Marc. Suchard Rega Institute Department of Microbiology and Immunology K.U. Leuven, Belgium, and Departments of Biomathematics and Human Genetics David Geffen School of Medicine at UL Department of Biostatistics UL School of Public Health SISMID p.1 Intra-Host Viral Evolution Retroviruses (and HBV) exist as a quasi-species within infected patients: Shared substitutions may be insufficient to resolve intra-host phylogenies Improve resolution using joint model: 1195 env sequences from 9 HIV+ patients [taken from Rambaut et al. (2004)] Indel rates substitution rates Opportunity to detect intra-host recombination SISMID p.2
2 Reconstructing the ree of Life: re Humans Just Big Slime Molds? Branches lade Support >.99 Scale.10 hermotoga hloroflexus hermus Escheria Streptomyces Pyrococcus Euryarchaeota Nanoarchaeum rchaeoglobus Ferroplasma Methanosarcina Halobacterium Methanothermobacter Methanococcus eropyrum Sulfolobus Pyrobaculum Eubacteria Eocytes Giardia Eukaryotes Entamoeba Saccharum Euglena rypanasoma Dictyostelium Saccharomyces Homo B Uncertain D ertain hermotoga ILVV D GPMPQREHVLLRQVEVPYMIVFINKDM VD DPELIDL hloroflex ILVVS PD GPMPQREHILLRQVQVPIVVFLNKVDM MD DPELLEL Escheria ILVV D GPMPQREHILLGRQVGVPYIIVFLNKDM VD DEELLEL Pyrococcus VLVV D GVMPQKEHFLRLGIKHIIVINKMDM VNYNQKRFEE Halobacterium VLVV DD GVPQREHVFLSRLGIDELIVVNKMDV VDYDESKYNE Methanococus VLVVNV DDKSGIQPQREHVFLIRLGVRQLVVNKMD VNFSEDYNE eropyrum ILVVSRKGEFEGMSE G QREHLLLRMGIEQIIVVNKMDPDVNYDQKRYEF Sulfol obus ILVVSKKGEYEGMSE G QREHIILSKMGINQVIVINKMDLDPYDEKRFKE Giardia ILVVGQGEFEGISKD G QREHLNLGIKMIIVNKMDDGQVKYSKERYDE Euglena VLVIDSGGFEGISKD G QREHLLYLGVKQMIVNKFDDKVKYSQRYEE Dictyostelium VLVISPGEFEGIKN G QREHLLYLGVKQMIVINKMDEKSNYSQRYDE Homo VLIVGVGEFEGISKN G QREHLLYLGVKQLIVGVNKMDSEPPYSQKRYEE hermotoga hloroflex Escheria Pyrococcus Halobacterium Methanococus eropyrum Sulfol obus Giardia Euglena Dictyostelium Homo GIDKDEVERGQVL PGSIKPHKRFKQIYVLKKEEGGRHPFKGYKPQFYIRDV GIERDVERGQVL PGSIKPHKKFEQVYVLKKEEGGRHPFFSGYRPQFYIRDV GIKREEIERGQVL KPGIKPHKFESEVYILSKDEGGRHPFFKGYRPQFYFRDV GVSKNDIKRGDVGHN PPVVRKDFKQIIVLNH PIVGYSPVLHHQV GIGKDDIRRGDVGPDD PPSV DFQQVVVMQH PSVIGYPVFHHQV GVGKKDIKRGDVLGHN PPV DFQIVVLQH PSVLDGYPVFHHQI GVSKSDIKRGDVGHLDK PPV EEFERIFVIWH PSIVGYPVIHVHSV GVEKKDVKRGDVGSVQN PPV DEFQVIVIWH PVGVGYPVLHVHSI GLVKDLKKGYVVGDVNDPPVG KSFQVIVMNH PKKIQPGYPVIDHHI NVSVKDIRRGYVSNKNDPKE DFQVIILNH PGQIGNGYPVLDHHI NVSVKEIKRGMVGDSKNDPPQE EKFVQVIVLNH PGQIHGYSPVLDHHI NVSVKDVRRGNVGDSKNDPPME GFQVIILNH PGQISGYPVLDHHI MP ree Partial u Plot ontentious issue among paleobiologists: Do rchaea (Euryarchaeota/Eocytes) form one or two domains? Weekly World News calls humans slime molds. SISMID p.3 he hicken or the (Small) Genome: Which ame First? Reptiles Mammals Birds log Genome Size Evolutionary History and Genome Sizes of Reptiles, Dinosaurs, Birds and Mammals Issue: Bird genomes are markedly smaller than those from other vertebrates. Question: Did small genomes precede flight or co-evolve? SISMID p.4
3 Maximum Parsimony (MP) Most often used best, not even statistically consistent, but fast, fast, fast...if youknowthetree Key: Findtreewithminimal#of suspected substitutions (internal states are not observed, 0/1 model process) ounting minimum # of substitutions is easy Enumerating (searching through) all possible trees is hard Human GG himp GG Mouse Fly G G Site: long Molecular Sequence Sites are independent Pre Order raversal {,G} X {,} G X Post Order raversal SISMID p.5 Maximum Parsimony (MP) littlehistory: nthony Edwards/Luca avalli-sforza (1963,1964) Both students of R. Fisher Introduced both parsimony and likelihood methods (for continuous quantities, e.g. gene frequencies) in one paper amin and Sokal (1965) provide first program for molecular sequences Fitch and Margoliash (1967) provide efficient algorithm Pre Order raversal {,} {,G} X X G Post Order raversal SISMID p.6
4 Maximum Parsimony lgorithm procedure Fitch and Margoliash (1967) lgorithm cost 0 {Initialization} pointer k 2N 1 {at the root node} o obtain the set R k of possible states at node k {Recursion} if k is leaf then R k observed character for taxon k else ompute R i,r j for daughters i, j of k if R i R j 0then R k R i R j 7 else R k R i R 6 j +1 5 end if end if minimum cost is {ermination} Pre Order raversal {,} {,G} X X G Post Order raversal SISMID p.7 Searching for the MP ree omplexity: Find MP score is NP-complete 5 Branches B 3 Branches 5 Branches B B B D B D 7 Branches Find MP tree is NP-hard B B D 5 Branches Recall that # of N-taxon rooted trees is 3 5 2N 3 ttack exponential-order space Branch-and-Bound: Monotonic order: min PS 2 min PS 3... Bound if min PS k > best n-taxon PS found so far. SISMID p.8
5 Neighbor-Joining (Saitou and Nei, 1987) omputational algorithm: alignment single tree Site (data) Pairwise distances = 0 = 1 = Join (closest) neighbors 12* 1 2 Recompute distances (using 12*,3,4) dvantages: veryfast,greatfor1000sofsequences Disadvantages: nosite-to-siteratevariation,nonatural ways to compare trees/measure data support SISMID p.9 Neighbor-Joining aveat: Pairs i, j with min d ij are not necessarily nearest neighbors. E.g., d B =3<d =5 1 B Solution: Subtractofftheaveragedistancestoallother leaves via 1 D ij = d ij (r i + r j ), r i = d ik, L 2 k L where L is the current set of leaves. Proof in Studier and Keppler (1988). omputational: O(N 3 ) D SISMID p.10
6 Likelihood-based Methods (Felsenstein, 1973) Statistical technique: assumes an unknown tree and a stochastic model for character change along the tree Site (data) state G ontinuous ime Markov hain time Unknown tree/internal node states dvantages: site-to-siterate/treevariationiseasy,can formulate probability statements Disadvantages: must search tree-space slow Foundation of Bayesian Phylogenetics SISMID p.11 raditional Phylogenetic Reconstruction Human himp Mouse Fly Reconstruction Example Site: 1 Human himp G G G G G G long Molecular Sequence Mouse Fly Substitution: singleresidue replaces another Insertion/deletion: residues are inserted or deleted Statistical Model ssume: Homologous sites are iid and site patterns (e.g. dotted box) XY...Z Multinomial(p XY...Z ) where p XY...Z is determined by an unknown tree τ, branch lengths t and continuous-time Markov chain model (for residue substitution) given by infinitesimal rate matrix Q P(X Y in time t) =e tq SISMID p.12
7 M(Q) =ɛ Normal(µ, σ 2 ) of Phylogenetics ontinuous in elapsed time t, discrete in starting/ending state! Memory-less process in which the probability that state b replaces state a during (t, t + s) =sq ab + o(s) state G time Infinitesimal generator matrix Q has off-diagonal entries q ab and row sums =0 hink: ExponentialwaitingtimewithrateR a = b q ab until chain leaves a. hen the new state b is independently chosen with probabilities q ab /R a SISMID p.13 From Infinitesimal to Finite ime Let p ab (t) =the finite-time probability of the chain moving from state a at time 0 to state b at time t, thenmatrix P (t) ={p ab (t)} satisfies with solution d P (t) =P (t)q where P (0) = I dt P (t) =e tq = I + tq (tq)2 + = k=0 1 k! (tq)k as d dt etq = Qe tq = e tq Q for t real SISMID p.14
8 Example: wo-state Model onsider purines (R) pyrimidines (Y). Kolmogorov forward equation: R α Y p RY (t + s) =p RR (t)αs + p RY (t)(1 s)+o(s) yielding d dt p RY(t) =αp RR (t) p RY (t) Q = ( α α ) Solutions of P (t) = e tq have the form with eigenvalues 0 and (α + ) c + de (α+)t SISMID p.15 Standard Ms for Phylogenetics Jukes and antor (J69), π a = 1 4, κ 1 = κ 2 =1 Kimura (K80), π a = 1 4, κ 1 = κ 2 Hasegawa, Kishino and Yano (HKY85), κ 1 = κ 2 (most common) amura and Nei (N93), right κ 1 κ 2 General ime Reversible (GR) Note identifiability concern in e tq.ommonsolutionistofix 1d.f.suchthat q aa π a = 1 a Scaling: t =1 1 expected substitution per site π π G π G π SISMID p.16
9 Explicit Parameterization of N93 Nucleotides mutate according to a Markovian process Pr(X Y in time t) =e tq Nuc where Q Nuc is a 4x4 infinitesimal rate matrix and t is a branch length. π π π G κ 1 0 G κ 1 π G π π κ 1 π π π Q Nuc = π π G κ 2 π κ 2 π π G κ 2 π π 1, where κ 1, κ 2 are transition:transversion rate ratios and π is the stationary distribution of {,G,,}. controls the overall rate and can vary from site-to-site. SISMID p.17 Site-to-Site Rate Variation Variation occurs quite naturally and is also an important inference short range: codon phase (slow-slow-fast) long range: enzymatic active sites, protein folding, immunological pressures/selection ssume: infinitesimalratesforsitek are r k t q ab.variouspriorson r k with E(r k )=1.ImplicitlyBayesian Yang (1994) discretized Gamma distribution SISMID p.18
10 General ime Reversible M Let Q = RD π where R is symmetric and D π is a diagonal matrix composed of the stationary distribution π. Detailed balance π a q ab = π b q ba.balance+ irreducibility reversible Note Q is similar to R, asd 1/2 QD 1/2 = R Hence, Q must have real eigenvalues and real eigenvectors he properties speed up computation of the finite-time transition matrix P (t) =e tq SISMID p.19 alculating the Probability of a Single Site Pattern Y i Given the tree and unobserved internal node states, the probability is the product of the finite time mutation probabilities over all branches: t1 t 2 t Y t 5 4 X t 3 G L(Y i ) p G = Pr(Y,t 1 )Pr(X G,t 2 ) X Y Pr(X,t 3 )Pr(Y,t 4 )Pr(X Y,t 5 )π X (1) Number of sumants grow rapidly in N sum-product/peeling algorithm to distribute sums across the product SISMID p.20
11 Pruning lgorithm Felsenstein (1981) Let P (L k a) =likelihoodofleavesbelownodek given k is in state a. hen,recursively compute P (L k a) given P (L i b) and P (L j c) for daughters i, j of k: Set pointer k 2N 1 {the root, initialization} Node k ompute P (L k a) a as follows: {recursion} if k is a leaf node then if a is observed then P(L P (L k a) =1 i b) else P (L k a) =0 P(L end if j c) else ompute P (L i a) and P (L j a) a for daughters i, j of k {post-order traversal} P (L k a) = P P b c Pr(a b, t i)p (L i b) Pr(a c, t j )P (L j c) end if L(Y i ) P a P (L 2n 1 a)π a {termination} a b a c SISMID p.21 ML ree or MP ree? Reporting uncertainty on tree estimates: he Bootstrap Most common ssumes evolutionary events are reproducible. If I went back out to the field and recollected exchangeable data... Bayesian inference Returns the probability of a tree given the observed data and model Requires MM (e.g., MrBayes or BES) dvantages Does not rely on asymptotics (hypothesis testing) Naturally incorporates uncertainty in all parameters (including discrete quantities: trees, site-classifications, etc.) rguably faster algorithms Disadvantages Must specify (justifiable) prior distributions SISMID p.22
Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationInferring Molecular Phylogeny
Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationReading for Lecture 13 Release v10
Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationBioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony
ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More informationTaming the Beast Workshop
Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31 Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31 genotype
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationMolecular Evolution and Comparative Genomics
Molecular Evolution and Comparative Genomics --- the phylogenetic HMM model 10-810, CMB lecture 5---Eric Xing Some important dates in history (billions of years ago) Origin of the universe 15 ±4 Formation
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationMaximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018
Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras
More informationBayesian Analysis of Elapsed Times in Continuous-Time Markov Chains
Bayesian Analysis of Elapsed Times in Continuous-Time Markov Chains Marco A. R. Ferreira 1, Marc A. Suchard 2,3,4 1 Department of Statistics, University of Missouri at Columbia, USA 2 Department of Biomathematics
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationMolecular Evolution and Phylogenetic Tree Reconstruction
1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More informationModern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales
Modern Phylogenetics n Introduction to Phylogenetics ret Larget larget@stat.wisc.edu epartments of otany and of Statistics University of Wisconsin Madison January 27, 2010 Phylogenies are usually estimated
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationEarly History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species
Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0
More informationSubstitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A
GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationPhylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.
Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony
More informationAlgorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,
Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837
More informationEvolutionary Analysis of Viral Genomes
University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral
More informationLecture Notes: Markov chains
Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity
More informationPhylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline
Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying
More informationLetter to the Editor. Department of Biology, Arizona State University
Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona
More informationMolecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016
Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36
More informationEstimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057
Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number
More informationDNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi
DNA Phylogeny Signals and Systems in Biology Kushal Shah @ EE, IIT Delhi Phylogenetics Grouping and Division of organisms Keeps changing with time Splitting, hybridization and termination Cladistics :
More informationMutation models I: basic nucleotide sequence mutation models
Mutation models I: basic nucleotide sequence mutation models Peter Beerli September 3, 009 Mutations are irreversible changes in the DNA. This changes may be introduced by chance, by chemical agents, or
More informationThe Phylogenetic Handbook
The Phylogenetic Handbook A Practical Approach to DNA and Protein Phylogeny Edited by Marco Salemi University of California, Irvine and Katholieke Universiteit Leuven, Belgium and Anne-Mieke Vandamme Rega
More informationWhat is Phylogenetics
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)
More informationPhylogeny Tree Algorithms
Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some
More informationChapter 3: Phylogenetics
Chapter 3: Phylogenetics 3. Computing Phylogeny Prof. Yechiam Yemini (YY) Computer Science epartment Columbia niversity Overview Computing trees istance-based techniques Maximal Parsimony (MP) techniques
More informationUsing phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)
Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationWeek 5: Distance methods, DNA and protein models
Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03
More informationPhylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X
More informationCS5263 Bioinformatics. Guest Lecture Part II Phylogenetics
CS5263 Bioinformatics Guest Lecture Part II Phylogenetics Up to now we have focused on finding similarities, now we start focusing on differences (dissimilarities leading to distance measures). Identifying
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 5
CSCI1950 Z Computa4onal Methods for Biology Lecture 5 Ben Raphael February 6, 2009 hip://cs.brown.edu/courses/csci1950 z/ Alignment vs. Distance Matrix Mouse: ACAGTGACGCCACACACGT Gorilla: CCTGCGACGTAACAAACGC
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationNon-Parametric Bayesian Population Dynamics Inference
Non-Parametric Bayesian Population Dynamics Inference Philippe Lemey and Marc A. Suchard Department of Microbiology and Immunology K.U. Leuven, Belgium, and Departments of Biomathematics, Biostatistics
More informationModeling Noise in Genetic Sequences
Modeling Noise in Genetic Sequences M. Radavičius 1 and T. Rekašius 2 1 Institute of Mathematics and Informatics, Vilnius, Lithuania 2 Vilnius Gediminas Technical University, Vilnius, Lithuania 1. Introduction:
More informationBMI/CS 776 Lecture 4. Colin Dewey
BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2
More informationMaximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A
J Mol Evol (2000) 51:423 432 DOI: 10.1007/s002390010105 Springer-Verlag New York Inc. 2000 Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus
More informationComputational Genomics
omputational Genomics Molecular Evolution: Phylogenetic trees Eric Xing Lecture, March, 2007 Reading: DTW boo, hap 2 DEKM boo, hap 7, Phylogeny In, Ernst Haecel coined the word phylogeny and presented
More informationWhat Is Conservation?
What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationPhylogenetics and Darwin. An Introduction to Phylogenetics. Tree of Life. Darwin s Trees
Phylogenetics and Darwin An Introduction to Phylogenetics Bret Larget larget@stat.wisc.edu Departments of Botany and of Statistics University of Wisconsin Madison February 4, 2008 A phylogeny is a tree
More informationPROTEIN PHYLOGENETIC INFERENCE USING MAXIMUM LIKELIHOOD WITH A GENETIC ALGORITHM
512 PROTEN PHYLOGENETC NFERENCE USNG MXMUM LKELHOOD WTH GENETC LGORTHM H. MTSUD Department of nformation and Computer Sciences, Faculty of Engineering Science, Osaka University 1-3 Machikaneyama, Toyonaka,
More informationA (short) introduction to phylogenetics
A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field
More informationSequence modelling. Marco Saerens (UCL) Slides references
Sequence modelling Marco Saerens (UCL) Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004), Introduction to machine learning.
More informationConsistency Index (CI)
Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)
More informationPhylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification
Multiple sequence alignment global local Evolutionary tree reconstruction Pairwise sequence alignment (global and local) Substitution matrices Gene Finding Protein structure prediction N structure prediction
More informationLecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).
1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationMultiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas
n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming
More information7. Tests for selection
Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info
More informationThe Causes and Consequences of Variation in. Evolutionary Processes Acting on DNA Sequences
The Causes and Consequences of Variation in Evolutionary Processes Acting on DNA Sequences This dissertation is submitted for the degree of Doctor of Philosophy at the University of Cambridge Lee Nathan
More informationEstimating Evolutionary Trees. Phylogenetic Methods
Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent
More informationMarkov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly
Markov Chains Sarah Filippi Department of Statistics http://www.stats.ox.ac.uk/~filippi TA: Luke Kelly With grateful acknowledgements to Prof. Yee Whye Teh's slides from 2013 14. Schedule 09:30-10:30 Lecture:
More informationMassachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution
Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral
More informationInferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution
Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods
More informationThanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides
hanks to Paul Lewis, Jeff horne, and Joe Felsenstein for the use of slides Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPM infers a tree from a distance
More informationLecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22
Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and
More informationPhylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science
Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.
More informationPhylogeny. November 7, 2017
Phylogeny November 7, 2017 Phylogenetics Phylon = tribe/race, genetikos = relative to birth Phylogenetics: study of evolutionary relationships among organisms, sequences, or anything in between Related
More informationPhylogenetic trees 07/10/13
Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationPhylogeny: building the tree of life
Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan
More informationHMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington
More informationPhylogenetic Inference using RevBayes
Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating
More informationAn Introduction to Bayesian Inference of Phylogeny
n Introduction to Bayesian Inference of Phylogeny John P. Huelsenbeck, Bruce Rannala 2, and John P. Masly Department of Biology, University of Rochester, Rochester, NY 427, U.S.. 2 Department of Medical
More informationMolecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço
Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)
More informationLecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26
Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and
More informationApplications of hidden Markov models for comparative gene structure prediction
pplications of hidden Markov models for comparative gene structure prediction sger Hobolth 1 and Jens Ledet Jensen 2 1 Bioinformatics Research entre, University of arhus 2 Department of heoretical Statistics,
More informationChapter 16: Reconstructing and Using Phylogenies
Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationPhylogeny Jan 5, 2016
גנומיקה חישובית Computational Genomics Phylogeny Jan 5, 2016 Slides: Adi Akavia Nir Friedman s slides at HUJI (based on ALGMB 98) Anders Gorm Pedersen,Technical University of Denmark Sources: Joe Felsenstein
More informationImproving divergence time estimation in phylogenetics: more taxa vs. longer sequences
Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More information7.36/7.91 recitation CB Lecture #4
7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer
More informationTheory of Evolution Charles Darwin
Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties
More information