reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina the data

Similar documents
Comparative Genomics II

Inferring Species Trees From Gene Duplication Episodes

A phylogenomic toolbox for assembling the tree of life

Algorithms for phylogeny construction

Parsimony via Consensus

Reconciliation with Non-binary Gene Trees Revisited

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

Algorithms for efficient phylogenetic tree construction

GenPhyloData: realistic simulation of gene family evolution

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Improved Gene Tree Error-Correction in the Presence of Horizontal Gene Transfer

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Session 5: Phylogenomics

Bioinformatics and Genomics Program, Center for Genomic Regulation, Doctor Aiguader, 88, Barcelona, Spain.

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore

Gene Trees, Species Trees, and Species Networks

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Algorithms in Bioinformatics

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogenetic inference

Evolutionary Tree Analysis. Overview

Effects of Gap Open and Gap Extension Penalties

Exploiting Gene Families for Phylogenomic Analysis of Myzostomid Transcriptome Data

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

molecular evolution and phylogenetics

What is Phylogenetics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Constructing Evolutionary/Phylogenetic Trees

C.DARWIN ( )

Using Trees: Myrmecocystus Phylogeny and Character Evolution and New Methods for Investigating Trait Evolution and Species Delimitation

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Dr. Amira A. AL-Hosary

Computational methods for predicting protein-protein interactions

Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa

Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Consensus methods. Strict consensus methods

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Analysis of Gene Order Evolution beyond Single-Copy Genes

Phylogenetics in the Age of Genomics: Prospects and Challenges

Ratio of explanatory power (REP): A new measure of group support

Phylogenetic Networks, Trees, and Clusters

BLAST. Varieties of BLAST

C3020 Molecular Evolution. Exercises #3: Phylogenetics

WenEtAl-biorxiv 2017/12/21 10:55 page 2 #2

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

A (short) introduction to phylogenetics

How to read and make phylogenetic trees Zuzana Starostová

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

PHYLOGENY AND SYSTEMATICS

Phylogenetic Tree Reconstruction

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

Constructing Evolutionary/Phylogenetic Trees

The PhyLoTA Browser: Processing GenBank for Molecular Phylogenetics Research

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic molecular function annotation

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Pogil Phylogenetic Trees Answer Key Ap Biology

Gene Tree Parsimony for Incomplete Gene Trees

Theory of Evolution Charles Darwin

Molecular phylogeny How to infer phylogenetic trees using molecular sequences


8/23/2014. Phylogeny and the Tree of Life

Reconstruire le passé biologique modèles, méthodes, performances, limites

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories

Consensus Methods. * You are only responsible for the first two

Edward Susko Department of Mathematics and Statistics, Dalhousie University. Introduction. Installation

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Chapter 9. Inferring Orthology and Paralogy. Adrian M. Altenhoff and Christophe Dessimoz. Abstract. 1. Introduction

OMICS Journals are welcoming Submissions

Lecture 8 Multiple Alignment and Phylogeny

EVOLUTIONARY DISTANCES

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

ALGORITHMIC STRATEGIES FOR ESTIMATING THE AMOUNT OF RETICULATION FROM A COLLECTION OF GENE TREES

PSODA: Better Tasting and Less Filling Than PAUP

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Gel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1

Lecture 6 Phylogenetic Inference

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Increasing Data Transparency and Estimating Phylogenetic Uncertainty in Supertrees: Approaches Using Nonparametric Bootstrapping

Package WGDgc. June 3, 2014

Transcription:

reconciling trees Stefanie Hartmann postdoc, Todd Vision s lab University of North Carolina 1 the data alignments and phylogenies for ~27,000 gene families from 140 plant species www.phytome.org publicly available DNA sequence data 813,840 protein sequences 26,870 gene families and 19,483 singletons 2

the goal reconcile the species tree with the gene trees to infer duplication / speciation nodes species tree gene trees 3 the goal reconcile the species tree with the gene trees to infer duplication / speciation nodes species tree gene trees 1 speciation nodes duplication nodes 1 3

the major challenge species tree and/or gene tree is unresolved species tree gene tree 4 the major challenge species tree and/or gene tree is unresolved species tree gene tree 5

minor challenges gene tree(s) must be rooted - midpoint rooting assumes a molecular clock - molecular clock assumption is often violated species tree must be rooted species names/ids must be substrings of sequence names/ids labeling and vizualization of results? 6 available software SDI Java, Perl gui, command line gene tree and species tree have to be fully resolved input/output: rooted trees in newick, NHX format http://www.phylogenomics.us/forester/ 7

available software Notung Java gui, command line input/output: rooted trees in newick, NHX, or Notung format species tree has to be resolved, the gene tree can be unresolved if it contains support values, output gene tree is resolved http://goby.compbio.cs.cmu.edu/notung/index.html 8 available software softparsmap Java, Jooc (XML tags) command line allows both gene tree and species tree to be unresolved input species tree: the NCBI taxonomy input gene trees: unrooted newick trees with support values input sequence: darwin database file (including GI number of GenBank entry) additional format requirements http://www.cbu.uib.no/~steffpar/softparsmap/ 9

available software softparsmap 10 available software Eulenstein software species tree has to be fully resolved, gene tree can be unresolved (?) GeneTree http://taxonomy.zoology.gla.ac.uk/rod/genetree/genetree.html available only for Mac OS 9 PPC, not scriptable Mesquite component (http://mesquiteproject.org/mesquite/mesquite.html) gtp (http://ginger.ucdavis.edu/gtp/gtp.html) primetv, reconcile (http://prime.sbc.su.se/primetv/index.html) 11

Motivation reconcile a species tree with a gene tree to infer duplication/speciation nodes Key Challenges gene tree(s) and/or species tree are often unresolved most software requires rooted input trees Precondition Steps Results user has gene tree and species tree in newick (or NHX or Schreiber) format. Species names or identifiers in species tree are substrings of sequence names adjust input format for gene tree(s), species tree, sequences, etc customize software (e.g., softparsmap) for data input (e.g., remove requirement for GI Number for sequences, allow sequence names to consist of integers and numbers) run software, parse output if necessary labeling of nodes on gene tree as "duplication", "speciation", "unknown" 12 literature SDI Zmasek CM, Eddy SR (2001) A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17: 821-828. Notung Chen C, Durand D, Farach-Colton M (2000) Notung: dating gene duplications using gene family trees. RECOMB2000, Fourth Annual International Conference on Computational Molecular Biology Chen K, Durand D, Farach-Colton M (2000) NOTUNG: A program for dating gene duplications and optimizing gene family trees. Journal of Computational Biology 7: 429-447. Durand D, Halldorsson BV, Vernot B (2005) A hybrid micro-macroevolutionary approach to gene tree reconstruction. RECOMB2005 link to the paper on Dannie Durand's website Durand D, Halldorsson BV, Vernot B (2006) A hybrid micro-macroevolutionary approach to gene tree reconstruction. J Comput Biol 13: 320-335.) 13

literature softparsmap Berglund-Sonnhammer AC, Steffansson P, Betts MJ, Liberles DA (2006) Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J Mol Evol 63: 240-250. Eulenstein software Chang, Wen-Chieh and Eulenstein, Oliver (2005) Reconciling Gene Trees with Apparent Polytomies. Technical Report, Computer Science, Iowa State University. link to technical report at IA State GeneTree Page RDM (1998) GeneTree: comparing gene and species phylogenies using reconciled trees. Bioinformatics 14: 819-820. 14 literature reviews or other relevant articles Arvestad L, Berglund A-C, Lagergren J, Sennbald B (2004) Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. Recomb 2004 link to pdf of article on prime website Cotton JA, Page RDM (2004) Tangled tales from multiple markers. In: Bininda-Emonds ORP, editor. Phylogenetic Supertrees. Combining information to reveal the Tree of Life. Springer. link to pdf of article on James Cotton's website Driskell AC, Ane C, Burleigh JG, McMahon MM, O'meara BC, et al. (2004) Prospects for building the tree of life from large sequence databases. Science 306: 1172-1174. Eulenstein O, Chen DH, Burleigh JG, Fernandez-Baca D, Sanderson MJ (2004) Performance of flip supertree construction with a heuristic algorithm. Systematic Biology 53: 299-308. Arvestad L, Berglund A-C, Lagergren J, Sennblad B (2003) Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19: 7i-15. Rydin C, Kallersjo M (2002) Taxon sampling and seed plant phylogeny. Cladistics-the International Journal of the Willi Hennig Society 18: 485-513. Simmons MP, Freudenstein JV (2002) Uninode coding vs gene tree parsimony for phylogenetic reconstruction using duplicate genes. Molecular Phylogenetics and Evolution 23: 481-498. V'Yugin VV, Gelfand MS, Lyubetsky VA (2002) Tree reconciliation: Reconstruction of species phylogeny by phylogenetic gene trees. Molecular Biology 36: 650-658. Simmons MP, Donovan Bailey C, Nixon KC (2000) Phylogeny Reconstruction Using Duplicate Genes. Molecular Biology and Evolution 17: 469-473. Slowinski JB, Knight A, Rooney AP (1997) Inferring species trees from gene trees: A phylogenetic analysis of the elapidae (Serpentes) based on the amino acid sequences of venom proteins. Molecular Phylogenetics and Evolution 8: 349-362. 15