Presentation by Julie Hudson MAT5313

Similar documents
Multiple Sequence Alignment. Sequences

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering

Dr. Amira A. AL-Hosary

Pathogenic fungi genomics, evolution and epidemiology! John W. Taylor University of California Berkeley, USA

Phylogenetics in the Age of Genomics: Prospects and Challenges

ECOL/MCB 320 and 320H Genetics

EVOLUTIONARY DISTANCES

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM

Computational Biology: Basics & Interesting Problems

Multiple genome rearrangement: a general approach via the evolutionary genome graph

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Recent Advances in Phylogeny Reconstruction

Graph Alignment and Biological Networks

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

On the complexity of unsigned translocation distance

A New Fast Heuristic for Computing the Breakpoint Phylogeny and Experimental Phylogenetic Analyses of Real and Synthetic Data

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study

Algorithms for Bioinformatics

Genomes and Their Evolution

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Xllth EUROPEAN CONFERENCE ON ANIMAL BLOOD GROUPS AND BIOCHEMICAL POLYMORPHISM

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

Lecture 7 Mutation and genetic variation

Single alignment: Substitution Matrix. 16 march 2017

Chapters 25 and 26. Searching for Homology. Phylogeny

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Chromosomal Breakpoint Reuse in Genome Sequence Rearrangement ABSTRACT

Chapter 26: Phylogeny and the Tree of Life

Analysis of Gene Order Evolution beyond Single-Copy Genes

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Ensembl Genomes (non-chordates): Quick tour. This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser.

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

Phylogenetic inference

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Prof. Fahd M. Nasr. Lebanese university Faculty of sciences I Department of Natural Sciences.

Name: Class: Date: ID: A

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.

Classification and Phylogeny

Comparing Genomes! Homologies and Families! Sequence Alignments!

Phylogeny and Molecular Evolution. Introduction

Multiple genome rearrangement

Supporting Information

Introduction to Bioinformatics Integrated Science, 11/9/05

Classification and Phylogeny

Isolating - A New Resampling Method for Gene Order Data

Understanding relationship between homologous sequences

Graphs, permutations and sets in genome rearrangement

Advanced Cell Biology. Lecture 2

Evolution by duplication

Intro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!

HORIZONTAL TRANSFER IN EUKARYOTES KIMBERLEY MC GRAIL FERNÁNDEZ GENOMICS

Molecular evolution - Part 1. Pawan Dhar BII

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Chapter 18 Systematics: Seeking Order Amidst Diversity

Lecture 11 Friday, October 21, 2011

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Biology of Fungi. Fungal Structure and Function. Lecture: Structure/Function, Part A BIOL 4848/ Fall Overview of the Hypha

Section 18-1 Finding Order in Diversity

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

A Framework for Orthology Assignment from Gene Rearrangement Data

What do we mean by a species? Morphological species concept. Morphological species concept BIOL2007 SPECIES AND BIODIVERSITY. Kanchon Dasmahapatra

7.06 Problem Set #4, Spring 2005

FUNDAMENTALS OF MOLECULAR EVOLUTION

Martin Bader June 25, On Reversal and Transposition Medians

REPRODUCTION. 7 th Grade Science Mr. Banks

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Constructing Evolutionary/Phylogenetic Trees

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Phylogeny & Systematics: The Tree of Life

Supplemental Figure 1.

Sequence evolution within populations under multiple types of mutation

Eukaryotic photosynthetic cells

MUTATION RATES IN PLASTID GENOMES:

Chapter 26 Phylogeny and the Tree of Life

Phylogenetics: Building Phylogenetic Trees

9/19/2012. Chapter 17 Organizing Life s Diversity. Early Systems of Classification

C.DARWIN ( )

Introduction to characters and parsimony analysis

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49

Amoeba hunts and kills paramecia and stentor. Eukaryotic photosynthetic cells

Phylogeny and the Tree of Life

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Evolutionary Tree Analysis. Overview

AP Biology Exam #7 (PRACTICE) Subunit #7: Diversity of Life

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Steps Toward Accurate Reconstructions of Phylogenies from Gene-Order Data 1

Advances in Phylogeny Reconstruction from Gene Order and Content Data


Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM)

Phylogenetic Tree Reconstruction

PHYLOGENY & THE TREE OF LIFE

Transcription:

Proc. Natl. Acad. Sci. USA Vol. 89, pp. 6575-6579, July 1992 Evolution Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome (genomics/algorithm/inversions/edit distance/conserved segments) DAVID SANKOFF*, GUILLAUME LEDUC*, NATALIE ANTOINEt, BRUNO PAQUINt, B. FRANZ LANGt, AND ROBERT CEDERGRENt Presentation by Julie Hudson MAT5313

Definitions Gene order: permutation of genomic arrangement Phylogenetics: the study of evolutionary history and finding genetic connection between species Mitochondrial genome: complete set of genes specific to the mitochondria that guide its function Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome

Research Question Can we infer evolutionary history from the arrangement of genes in the mitochondria of various species?

Introduction Evolutionary inference traditionally done via comparison of homologous versions of a single gene mtdna is susceptible to mutation (rapid nucleotide substitution) making homology difficult to differentiate from noise Genome level analysis will be more robust to this mutation Analytic tools different when comparing gene vs genome level similarity

Data Mitochondrial genome sequences from: Fungi (8) Animals (7) Protists (1) With 31-50 mitochondrial genes (humans have 37) Why mitochondrial? Complete genome consists of a small number of genes Table genomes compared Genome Fungi Fission yeast Schizosaccharomyces pombe (6)* Budding yeast Torulopsis glabrata (7) Saccharomyces cerevisiae (8) Kluyveromyces lactis (9) Filamentous Ascomycetes Neurospora crassa (10) Aspergillus nidulans (11) Podospora anserina (12) Chytridiomycetes Allomyces macrogynus (B.P. and B.F.L., unpublished data) Protist Phytophthora infestans (B.F.L., unpublished data) Animals Vertebrates Mammalia (13, 14) Gallus gallus (chicken) (15) Echinoderms Strongylocentrotus purpuratus (sea urchin) (16) Asterina pectinifera (star fish) (17) Pisaster ochraceus (sea star) (18) Insect Drosophila yakuba (19) Nematode Ascaris suum (20) Genes, no. 35 34 39 31 50 44 45 35 31 37 37 37 37 36 37 36

Methods Edit Distance (noted as E(a,b)) is the number of elementary events necessary to change the gene order of one circular genome a into that of b. This is how gene arrangement is measured Elementary events: deletion, insertion, inversion, transposition. E(a,b) = D(a,b) + R(a,b)

Methods D(a,b) = total number of genes present in only one of a or b. (D for deletion/insertion) Simple to determine a : 1, 2, 3, 4, 5 ; b : 1, 3, 4, 5, 6 D(a,b) = 2

Methods R(a,b) = minimal number of inversion and transposition events necessary to convert one to the other, ignoring missing genes (D) Not so straight forward Based on a conserved chromosomal segment counting technique by Nadeau and Taylor, C a : 1, 2, 3, 4, 5 ; b : 1, 3, 2, 4, 5 C = 3 (what is expected from 1 inversion event) Differs if inversion occurs at the end or coincides with a previous inversion event. a : 1, 2, 3, 4, 5 ; b: 1, 2, 3, 5, 4 C = 2 C is no greater than 2R in a circular genome

Methods Alignment Reduction alignment: 1 23 4 5 6 7 89 10 reduced alignment: 2Kt@, When pairs of genes are adjacent in both genomes with same orientation and order, they can be combined because the minimum number of recombinatory events will be the same. Conserved segments can be reduced Rearrangement Distance (R) determined through a branch-and-bound search using the program DERANGE. A series of alignment reductions and inversion/ transposition events until the alignment is completely reduced (one link)

I 21 3 4 5 1 2 3 4 5 b~~~~~ivr vy 5& 2 3 5 2 4 1 3 2 5 4 1 combine 2&3 4r combine 4&5 l 4nv4& I $4 2 1 4 2 4 1 4 combine I&2 1 4 I 4 Three-inversion solution : invert I 1 4 : 1 : 4-4> combine 1 &4 Methods I DERANGE runs up to 5000 paths consecutively to determine which set of events returns the minimal rearrangement distance (where the bound-and-search is used). Paths are discarded if they cannot lead to a minimal value or if the intermediate genome is determined to be probabilistically unlikely. Example determining the minimal event distance through inversions and alignment reductions

Results Distance (D below, R above) Mam Gal Str Ast Pis Dro Asc Phy All Sch Tor Klu Sac Asp Neu Pod Mam 1 18 16 19 13 25 12 18 21 17 16 19 23 26 27 Gal 0 19 17 17 12 26 13 22 21 17 17 19 23 24 26 Str 0 0 2 1 26 27 13 21 19 19 16 20 24 27 25 Ast 4 4 4 1 22 25 13 20 16 14 18 18 24 25 25 Pis 1 1 1 5 23 24 12 17 20 17 16 19 24 24 22 Dro 0 0 0 4 1 28 11 19 21 17 19 17 26 26 27 Asc 1 1 1 5 2 1 11 15 14 13 13 12 16 20 16 Phy 26 26 26 28 25 26 25 10 10 10 8 10 12 11 15 All 12 12 12 14 13 12 13 30 15 14 13 14 17 17 16 Sch 14 14 14 18 13 14 15 28 18 15 15 18 18 19 18 Tor 17 17 17 19 16 17 18 29 19 7 11 10 15 12 15 Klu 18 18 18 20 17 18 19 28 20 8 3 11 11 11 13 Sac 20 20 20 24 19 20 21 32 24 10 5 8 13 13 15 Asp 11 11 11 13 12 11 12 31 13 13 16 17 21 10 10 Neu 15 15 15 19 16 15 16 35 19 17 22 23 25 14 9 Pod 10 10 10 14 11 10 11 30 16 14 19 20 22 11 15 Animal Protist Fungus Table 2 from paper: Distance between genome pairs

Results Edit Distance (D + R) Mam Gal Str Ast Pis Dro Asc Phy All Sch Tor Klu Sac Asp Neu Pod Mam Gal 1 Str 18 19 Ast 20 21 6 Pis 20 18 2 6 Dro 13 12 26 26 24 Asc 26 27 28 30 26 29 Phy 38 39 39 41 37 37 36 All 30 34 33 34 30 31 28 40 Sch 35 35 33 34 30 35 29 38 33 Tor 34 34 36 33 33 34 31 39 33 22 Klu 34 35 34 38 33 37 32 36 33 23 14 Sac 39 39 40 42 38 37 33 42 38 28 15 19 Asp 24 34 35 37 36 37 28 43 30 31 31 28 34 Neu 41 39 42 44 40 41 26 46 36 36 34 34 38 24 Pod 37 36 36 39 33 37 27 45 32 32 34 33 37 21 24 Animal Protist Fungus

Results Edit Distance (D + R) Mam Gal Str Ast Pis Dro Asc Phy All Sch Tor Klu Sac Asp Neu Pod Mam Gal 1 Str 18 19 Ast 20 21 6 Pis 20 18 2 6 Dro 13 12 26 26 24 Asc 26 27 28 30 26 29 Phy 38 39 39 41 37 37 36 All 30 34 33 34 30 31 28 40 Sch 35 35 33 34 30 35 29 38 33 Tor 34 34 36 33 33 34 31 39 33 22 Klu 34 35 34 38 33 37 32 36 33 23 14 Sac 39 39 40 42 38 37 33 42 38 28 15 19 Asp 24 34 35 37 36 37 28 43 30 31 31 28 34 Neu 41 39 42 44 40 41 26 46 36 36 34 34 38 24 Pod 37 36 36 39 33 37 27 45 32 32 34 33 37 21 24 Animal Protist Fungus Dark to light -> Close to far

Results Edit distances fitted to an additive tree model using a weighted leastsquares criterion generated this tree +-----------------------------------P-------Phytophthora Protist inestans +-+ ---------------------Ascarissuum I I I+-+-! +-+ +----------------- I I! +----------+-+ +. _ I I + -* -- I -* ----- -----Asterina pectnifera -Pisaster ochraceus - Strongylocentrotuspurpuratus -----Drosophilayakuba +Gallus gallus +Mamnmala +---------------------- Schizosaccharomycespombe +-------------Kluyveromyces lactis!+-------------------saccharonmyces cerevisiae +----------Torunopsis glabrata Animal +------------------ Aspergillus nidulans,+! +---------------------------Neurospora crassa I +-------------------Podospora anserina +-----------------------------.Allomtyces macrogynus Fungi

Results Validation: to show calculated R(a,b) significantly different from random noise Method: random circular permutations created and tested like the genome to determine noise level Result: Within animal and fungi group the values of R are non-random; between the groups is random Shows that R contains phylogenetic information

Discussion Overall assessment: coherence of phylogeny indicates that the macrostructures of genomes contain quantitatively meaningful information for phylogenetic reconstruction Assumptions: - All inferred rearrangement events contribute the same amount to E (solution: weighting) - Each deletion is a separate event (solution: slowgrowing convex function to determine D if events simultaneous) - No back mutation (solution: simulation study determining a correction for this underestimation)