Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions
|
|
- Andrew Davis
- 5 years ago
- Views:
Transcription
1 PLGW05 Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 1 joint work with Ilan Gronau 2, Shlomo Moran 3, and Irad Yavneh Dept. of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA, 3 Computer Science Dept.,Technion, Haifa, Israel June 24, 2011 Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 1 / 16
2 Motivation Motivation X... t 5? t 6 t 1 t 2 t 3 t 4 A t 1 B t 2 t int t 3 t 4 D C ATCA... A A : ATCA... B : ATGA... C : ATTA... D : TTTA... ATGA... ATTA... TTTA... B C A G 1 2 C T D Substitution Rate function (Distance measure) 1 (P) 4 (P) 2 (P) 3 (P) 3 4 D AB = ( ATCA..., ATGA... ) D AC = ( ATCA..., ATTA... ) D AD = ( ATCA..., TTTA... ) D BC = ( ATGA..., ATTA... )... Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 2 / 16
3 Motivation Motivation... t 5 t 1 t 2 t 3 t 4 X? A t What 6 distance measure t 1 t int should I choose? B t 2 t 3 t 4 D C ATCA... A A : ATCA... B : ATGA... C : ATTA... D : TTTA... ATGA... ATTA... TTTA... B C A G 1 2 C T D Substitution Rate function (Distance measure) 1 (P) 4 (P) 2 (P) 3 (P) 3 4 D AB = ( ATCA..., ATGA... ) D AC = ( ATCA..., ATTA... ) D AD = ( ATCA..., TTTA... ) D BC = ( ATGA..., ATTA... )... Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 2 / 16
4 Motivation An ongoing quest... Previous works: I. Gronau, S. Moran, and I. Yavneh: Towards optimal distance functions for stochastic substitution models. J Theor Biol, 260(2): , I. Gronau, S. Moran, and I. Yavneh: Adaptive distance measures for resolving K2P quartets: Metric separation versus stochastic noise. J Comp Biol, 17(11): , Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 3 / 16
5 SR functions Distance measures Kimura two-parameter (K2P) model Distance measures A α G β C β α β T d uw = d uv + d vw u v w α=transitions β=transversions transition-to-transversion ratio R = α 2β biological evidence that α>β normalization: α + 2β = 1 additive distance measures induce tree metrics in homogeneous models distance measures given by (t) additive distance measure in K2P: standard SR function: K2P (t) = αt + 2βt = t Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 4 / 16
6 Evaluation of SR functions Experiment: Hasegawa s tree average normalized RF distance Hasegawa's tree, sequence length = 500bp, R = 2 K2P Mouse 0.77 Gibbon Orang Gorilla Chimp Human tree diameter Bovine Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 5 / 16
7 Evaluation of SR functions Experiment: Hasegawa s tree average normalized RF distance Hasegawa's tree, sequence length = 500bp, R = 2 K2P tree diameter 1. simulate evolution according to K2P model with ti-tv ratio R = 2 2. for each tree size, generate batches of 7-way alignments of 500bp length 3. measure Robinson-Foulds tree distance to true tree Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 5 / 16
8 Evaluation of SR functions Experiment: Hasegawa s tree average normalized RF distance Hasegawa's tree, sequence length = 500bp, R = 2 K2P JC tree diameter 1. simulate evolution according to K2P model with ti-tv ratio R = 2 2. for each tree size, generate batches of 7-way alignments of 500bp length 3. measure Robinson-Foulds tree distance to true tree JC is statistical consistent w.r.t. Haswegawa s tree! Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 5 / 16
9 Evaluation of SR functions Non-additive SR functions estimated distance (t) JC K2P ti tv ratio: 10 Jukes Cantor (JC) model is a homogeneous submodel of K2P for R = 1 2 JC deviates from additivity in K2P and in homogeneous submodels for R > 0.5 induces near-additive metric w.r.t to Hasegawa s tree has lower stochastic variance than K2P evolutionary time t Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 6 / 16
10 Evaluation of SR functions Non-additive SR functions estimated distance (t) ti tv ratio: 10, sequence length: 1000bp JC K2P σ ( JC ) σ ( K2P ) Jukes Cantor (JC) model is a homogeneous submodel of K2P for R = 1 2 JC deviates from additivity in K2P and in homogeneous submodels for R > 0.5 induces near-additive metric w.r.t to Hasegawa s tree has lower stochastic variance than K2P evolutionary time t Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 6 / 16
11 Evaluation of SR functions Non-additive SR functions 2 JC ti tv ratio: 10 Gibbon Orang Gorilla Chimp K2P Mouse Human t 0 estimated distance (t) t t evolutionary time t t 1 Bovine affine additive transformation aff = A + b remains additive allows analysis of non-additive SR function deviation from additivity in [t 0, t 1]: 1 max{ (t) at b : t [t0, t1]} a check consistency using nearadditivity theorem (Atteson, 1999) Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 7 / 16
12 Evaluation of SR functions Non-additive SR functions 2 JC ti tv ratio: 10 Gibbon Orang Gorilla Chimp K2P Mouse Human t 0 estimated distance (t) int = A K2P + b 0 0 t t evolutionary time t t 1 Bovine affine additive transformation aff = A + b remains additive allows analysis of non-additive SR function deviation from additivity in [t 0, t 1]: 1 max{ (t) at b : t [t0, t1]} a check consistency using nearadditivity theorem (Atteson, 1999) Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 7 / 16
13 Evaluation of SR functions Non-additive SR functions estimated distance (t) ti tv ratio: 10, sequence length: 1000bp JC aff = A K2P + b σ ( JC ) σ ( aff ) 0 t t evolutionary time t Mouse t 1 Gibbon Orang Bovine Gorilla Human Chimp affine additive transformation aff = A + b remains additive allows analysis of non-additive SR function deviation from additivity in [t 0, t 1]: 1 max{ (t) at b : t [t0, t1]} a check consistency using nearadditivity theorem (Atteson, 1999) t 0 Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 7 / 16
14 Evaluation of SR functions Non-additive SR functions ti tv ratio: 10, sequence length: 1000bp Gibbon Orang estimated distance (t) t 0 JC aff = A K2P + b σ ( JC ) σ ( aff ) dev ( JC, [t 0, t 1 ]) fixed error evolutionary time t t 1 Mouse t 1 Bovine Human Chimp affine additive transformation aff = A + b remains additive allows analysis of non-additive SR function deviation from additivity in [t 0, t 1]: 1 max{ (t) at b : t [t0, t1]} a check consistency using nearadditivity theorem (Atteson, 1999) t 0 Gorilla Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 7 / 16
15 Experiments on Quartets Two extreme archetypes of quartets with long and short edges Felsenstein quartet Farris quartet t 0 B D t t s s ts t l = 5 t s t 0 t C s D t s t s C D t l t l t l t l A C A B B A t 1 t 1 underestimation of t 0 + t 1 decreases separation of the split AB CD biased towards AC BD underestimation of t 0 + t 1 increases separation of the split AB CD bias towards correct split Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 8 / 16
16 Experiments on Quartets Two extreme archetypes of quartets with long and short edges Felsenstein quartet Felsensteinquartet, sequencelength=500bp, R =5 Δ J C Δ K2P Farris quartet Farris quartet, sequence length =500bp, R =5 Δ J C Δ K2P failure rate 0.10 failure rate t 1 t 1 underestimation of t 0 + t 1 decreases separation of the split AB CD biased towards AC BD underestimation of t 0 + t 1 increases separation of the split AB CD bias towards correct split Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 8 / 16
17 Experiments on Quartets Two extreme archetypes of quartets with long and short edges Felsenstein quartet Felsensteinquartet, sequencelength=500bp, R =5 Δ J C Δ K2P Farris quartet Farris quartet, sequence length =500bp, R =5 Δ J C Δ K2P failure rate 0.10 failure rate t 1 t 1 despite impedimental bias JC performs better than K2P for moderate t l /t s ratio e.g. t l = 3.5 t s underestimation of t 0 + t 1 increases separation of the split AB CD bias towards correct split Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 8 / 16
18 Fisher s Linear Discriminant Fisher s Linear Discriminant Fisher s Linear Discriminant measures separation between independent normally-distributed random variables: X N(µ 1, σ 1) and Y N(µ 2, σ 2) FLD(X, Y ) = µ 1 µ 2 σ 2 1 +σ2 2 = SEP( ) NOISE( ) B D µ 1 = D AC + D BD µ 2 = D AB + D CD σ1 2 = σ 2 (D AC ) + σ 2 (D BD ) σ2 2 = σ 2 (D AB ) + σ 2 (D CD ) A w int C Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 9 / 16
19 Fisher s Linear Discriminant FLD on Felsenstein s quartet t i = 0.2 t l = 1 t s [0.2, 1] % % & & s t i t s t l " # $! t l t s t i t s t l # $!"#!" #$% $!"#!" #$% $!" #$%!" #$%!" #$%!" #$% Simulation: 100,000 trees per data point, sequence length of 1000 bp. Prediction based on FLD. Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 10 / 16
20 Modelling Seperation and Noise with FLD Fisher s Linear Discriminant: Separation vs. Noise Fisher s Linear Discriminant measures separation between independent normally-distributed random variables: X N(µ 1, σ 1) and Y N(µ 2, σ 2) FLD(X, Y ) = µ 1 µ 2 σ 2 1 +σ2 2 = SEP( ) NOISE( ) SEP( ) = µ 1 µ 2, NOISE( ) = σ σ2 2 FLD( 1 ) FLD( 2 ) = SEP( 1) SEP( 2 ) / NOISE( 1) NOISE( 2 ) independent of sequence length Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 11 / 16
21 Modelling Seperation and Noise with FLD Fisher s Linear Discriminant: Separation vs. Noise Fisher s Linear Discriminant measures separation between independent normally-distributed random variables: X N(µ 1, σ 1) and Y N(µ 2, σ 2) FLD(X, Y ) = µ 1 µ 2 σ 2 1 +σ2 2 = SEP( ) NOISE( ) SEP( ) = µ 1 µ 2, NOISE( ) = σ σ2 2 FLD( 1 ) FLD( 2 ) = SEP( 1) SEP( 2 ) / NOISE( 1) NOISE( 2 ) independent of sequence length Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 11 / 16
22 Modelling Seperation and Noise with FLD Separation between noise and deviation from additivity R=5 R=2 Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 12 / 16
23 Experiment on biological data Experiment on biological data difference from RF of BIONJ GTR tree JC K2P LogDet number of trees RF between BIONJ GTR tree and LTP tree 163 bacterial species 31 marker genes (Ciccarelli et al, 2006) sample 40, 000 random 10-species sub-alignments extract four-fold degenerate sites reference tree from the Living Tree Project (ARB-Silva) Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 13 / 16
24 Summary Summary Surprising observation: non-additive SR functions can improve reconstruction accuracy Example: JC SR function in K2P trees Introduced concepts: deviation from additivity affine-additive SR function SEP and NOISE More information in our (soon published) WABI paper Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 14 / 16
25 Summary The end Thank you! Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 15 / 16
26 Summary Selected references K. Atteson. The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica, 25: , D. Doerr, I. Gronau, S. Moran, and I. Yavneh. Stochastic errors vs. modeling errors in distance based phylogenetic reconstructions, in preparation, Joseph Felsenstein. Inferring Phylogenies. Sinauer Associates, 2 edition, September I. Gronau, S. Moran, and I. Yavneh. Towards optimal distance functions for stochastic substitution models. J Theor Biol, 260(2): , I. Gronau, S. Moran, and I. Yavneh. Adaptive distance measures for resolving K2P quartets: Metric separation versus stochastic noise. J Comp Biol, 17(11): , Motoo Kimura. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16(2): , June Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions 16 / 16
Phylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More informationHow should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?
How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What
More informationLecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30
Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny
More informationNJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees
NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana
More informationSubstitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A
GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationLetter to the Editor. Department of Biology, Arizona State University
Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationLecture Notes: Markov chains
Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More informationReconstruire le passé biologique modèles, méthodes, performances, limites
Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire
More informationDistance Corrections on Recombinant Sequences
Distance Corrections on Recombinant Sequences David Bryant 1, Daniel Huson 2, Tobias Kloepper 2, and Kay Nieselt-Struwe 2 1 McGill Centre for Bioinformatics 3775 University Montréal, Québec, H3A 2B4 Canada
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationEstimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057
Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationBMI/CS 776 Lecture 4. Colin Dewey
BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2
More informationInferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution
Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods
More informationUsing algebraic geometry for phylogenetic reconstruction
Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA
More informationLecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22
Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and
More informationEfficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used
Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36
More informationMinimum evolution using ordinary least-squares is less robust than neighbor-joining
Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary
CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch
More informationPhylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz
Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels
More informationPhylogenetic Assumptions
Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by
More informationPhylogenetic Algebraic Geometry
Phylogenetic Algebraic Geometry Seth Sullivant North Carolina State University January 4, 2012 Seth Sullivant (NCSU) Phylogenetic Algebraic Geometry January 4, 2012 1 / 28 Phylogenetics Problem Given a
More informationNeighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances
Neighbor Joining Algorithms for Inferring Phylogenies via LCA-Distances Ilan Gronau Shlomo Moran September 6, 2006 Abstract Reconstructing phylogenetic trees efficiently and accurately from distance estimates
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationLecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26
Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationMolecular Evolution and Phylogenetic Tree Reconstruction
1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length
More informationMaximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.
Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf
More informationMolecular evolution 2. Please sit in row K or forward
Molecular evolution 2 Please sit in row K or forward RBFD: cat, mouse, parasite Toxoplamsa gondii cyst in a mouse brain http://phenomena.nationalgeographic.com/2013/04/26/mind-bending-parasite-permanently-quells-cat-fear-in-mice/
More informationMassachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution
Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral
More informationC.DARWIN ( )
C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships
More informationModeling Noise in Genetic Sequences
Modeling Noise in Genetic Sequences M. Radavičius 1 and T. Rekašius 2 1 Institute of Mathematics and Informatics, Vilnius, Lithuania 2 Vilnius Gediminas Technical University, Vilnius, Lithuania 1. Introduction:
More informationAssessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition
Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National
More informationWeighted Quartets Phylogenetics
Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087 Problem: quartet-based supertree Input Output A B C D A C D E
More informationarxiv: v1 [q-bio.pe] 27 Oct 2011
INVARIANT BASED QUARTET PUZZLING JOE RUSINKO AND BRIAN HIPP arxiv:1110.6194v1 [q-bio.pe] 27 Oct 2011 Abstract. Traditional Quartet Puzzling algorithms use maximum likelihood methods to reconstruct quartet
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationDistances that Perfectly Mislead
Syst. Biol. 53(2):327 332, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490423809 Distances that Perfectly Mislead DANIEL H. HUSON 1 AND
More informationRecent Advances in Phylogeny Reconstruction
Recent Advances in Phylogeny Reconstruction from Gene-Order Data Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131 Department Colloqium p.1/41 Collaborators
More informationPreliminaries. Download PAUP* from: Tuesday, July 19, 16
Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including
More informationInferring Molecular Phylogeny
Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationLie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia
Lie Markov models Jeremy Sumner School of Physical Sciences University of Tasmania, Australia Stochastic Modelling Meets Phylogenetics, UTAS, November 2015 Jeremy Sumner Lie Markov models 1 / 23 The theory
More informationTheDisk-Covering MethodforTree Reconstruction
TheDisk-Covering MethodforTree Reconstruction Daniel Huson PACM, Princeton University Bonn, 1998 1 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document
More informationIdentifiability of the GTR+Γ substitution model (and other models) of DNA evolution
Identifiability of the GTR+Γ substitution model (and other models) of DNA evolution Elizabeth S. Allman Dept. of Mathematics and Statistics University of Alaska Fairbanks TM Current Challenges and Problems
More informationPhylogenetic invariants versus classical phylogenetics
Phylogenetic invariants versus classical phylogenetics Marta Casanellas Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya Algebraic
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationMolecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences
Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 1 Learning Objectives
More informationLab 9: Maximum Likelihood and Modeltest
Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*
More informationGenetic distances and nucleotide substitution models
4 Genetic distances and nucleotide substitution models THEORY Korbinian Strimmer and Arndt von Haeseler 4.1 Introduction One of the first steps in the analysis of aligned nucleotide or amino acid sequences
More informationPredicting the Evolution of two Genes in the Yeast Saccharomyces Cerevisiae
Available online at wwwsciencedirectcom Procedia Computer Science 11 (01 ) 4 16 Proceedings of the 3rd International Conference on Computational Systems-Biology and Bioinformatics (CSBio 01) Predicting
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39
More informationIn: M. Salemi and A.-M. Vandamme (eds.). To appear. The. Phylogenetic Handbook. Cambridge University Press, UK.
In: M. Salemi and A.-M. Vandamme (eds.). To appear. The Phylogenetic Handbook. Cambridge University Press, UK. Chapter 4. Nucleotide Substitution Models THEORY Korbinian Strimmer () and Arndt von Haeseler
More informationHow Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building
How Molecules Evolve Guest Lecture: Principles and Methods of Systematic Biology 11 November 2013 Chris Simon Approaching phylogenetics from the point of view of the data Understanding how sequences evolve
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood
More information1. Can we use the CFN model for morphological traits?
1. Can we use the CFN model for morphological traits? 2. Can we use something like the GTR model for morphological traits? 3. Stochastic Dollo. 4. Continuous characters. Mk models k-state variants of the
More informationMaximum Likelihood in Phylogenetics
Maximum Likelihood in Phylogenetics June 1, 2009 Smithsonian Workshop on Molecular Evolution Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut, Storrs, CT Copyright 2009
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationAlgorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,
Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837
More informationWeek 5: Distance methods, DNA and protein models
Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03
More informationELIZABETH S. ALLMAN and JOHN A. RHODES ABSTRACT 1. INTRODUCTION
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 13, Number 5, 2006 Mary Ann Liebert, Inc. Pp. 1101 1113 The Identifiability of Tree Topology for Phylogenetic Models, Including Covarion and Mixture Models ELIZABETH
More informationThe Phylogenetic Handbook
The Phylogenetic Handbook A Practical Approach to DNA and Protein Phylogeny Edited by Marco Salemi University of California, Irvine and Katholieke Universiteit Leuven, Belgium and Anne-Mieke Vandamme Rega
More informationWeighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction
Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction William J. Bruno,* Nicholas D. Socci, and Aaron L. Halpern *Theoretical Biology and Biophysics, Los Alamos
More information7. Tests for selection
Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info
More informationMolecular Evolution, course # Final Exam, May 3, 2006
Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least
More informationOn the Uniqueness of the Selection Criterion in Neighbor-Joining
Journal of Classification 22:3-15 (2005) DOI: 10.1007/s00357-005-0003-x On the Uniqueness of the Selection Criterion in Neighbor-Joining David Bryant McGill University, Montreal Abstract: The Neighbor-Joining
More informationBIOINFORMATICS DISCOVERY NOTE
BIOINFORMATICS DISCOVERY NOTE Designing Fast Converging Phylogenetic Methods!" #%$&('$*),+"-%./ 0/132-%$ 0*)543768$'9;:(0'=A@B2$0*)A@B'9;9CD
More informationAn Investigation of Phylogenetic Likelihood Methods
An Investigation of Phylogenetic Likelihood Methods Tiffani L. Williams and Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131-1386 Email: tlw,moret @cs.unm.edu
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationMaximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018
Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras
More informationA Minimum Spanning Tree Framework for Inferring Phylogenies
A Minimum Spanning Tree Framework for Inferring Phylogenies Daniel Giannico Adkins Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-010-157
More informationPhylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches
Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell
More informationConsistency Index (CI)
Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)
More informationX X (2) X Pr(X = x θ) (3)
Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree
More informationA Statistical Test of Phylogenies Estimated from Sequence Data
A Statistical Test of Phylogenies Estimated from Sequence Data Wen-Hsiung Li Center for Demographic and Population Genetics, University of Texas A simple approach to testing the significance of the branching
More informationRELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG
RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson
More informationMolecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016
Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,
More informationPhylogeny: traditional and Bayesian approaches
Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent
More informationPhylogenetic Inference and Hypothesis Testing. Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne
Phylogenetic Inference and Hypothesis Testing Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne November 13, 2003 Contents 1 Introduction 4 2 Molecular Phylogenetics
More informationThe Generalized Neighbor Joining method
The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Mathematics Duke University Joint work with Dan Levy and Lior Pachter www.math.duke.edu/ ruriko data mining 1 Challenge We would like to
More informationAlgebraic Statistics Tutorial I
Algebraic Statistics Tutorial I Seth Sullivant North Carolina State University June 9, 2012 Seth Sullivant (NCSU) Algebraic Statistics June 9, 2012 1 / 34 Introduction to Algebraic Geometry Let R[p] =
More informationThe least-squares approach to phylogenetics was first suggested
Combinatorics of least-squares trees Radu Mihaescu and Lior Pachter Departments of Mathematics and Computer Science, University of California, Berkeley, CA 94704; Edited by Peter J. Bickel, University
More informationarxiv:q-bio/ v1 [q-bio.pe] 27 May 2005
Maximum Likelihood Jukes-Cantor Triplets: Analytic Solutions arxiv:q-bio/0505054v1 [q-bio.pe] 27 May 2005 Benny Chor Michael D. Hendy Sagi Snir December 21, 2017 Abstract Complex systems of polynomial
More information