# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp

Similar documents
2 Genome evolution: gene fusion versus gene fission

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

Introduction to Bioinformatics Integrated Science, 11/9/05

Computational approaches for functional genomics

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Computational methods for predicting protein-protein interactions

Fitness constraints on horizontal gene transfer

Comparative genomics: Overview & Tools + MUMmer algorithm

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

Evolutionary Analysis by Whole-Genome Comparisons

Genomes and Their Evolution

Chapter 15 Active Reading Guide Regulation of Gene Expression

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Chapter 19. Microbial Taxonomy

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Evolutionary Use of Domain Recombination: A Distinction. Between Membrane and Soluble Proteins

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Microbial Taxonomy and the Evolution of Diversity

Midterm Exam #1 : In-class questions! MB 451 Microbial Diversity : Spring 2015!

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

ABSTRACT. As a result of recent successes in genome scale studies, especially genome

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Regulation of gene expression. Premedical - Biology

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life

Principles of Genetics

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Types of biological networks. I. Intra-cellurar networks

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Dr. Amira A. AL-Hosary

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

8/23/2014. Phylogeny and the Tree of Life

Quiz answers. Allele. BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA)

Computational Biology: Basics & Interesting Problems

Genome reduction in prokaryotic obligatory intracellular parasites of humans: a comparative analysis

The use of gene clusters to infer functional coupling

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Regulation of Gene Expression

The Complement of Enzymatic Sets in Different Species

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization

Eukaryotic vs. Prokaryotic genes

PROTEIN SYNTHESIS INTRO

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Graph Alignment and Biological Networks

Introduction. Gene expression is the combined process of :

CSCE555 Bioinformatics. Protein Function Annotation

Regulation of Gene Expression

Network Alignment 858L

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

WHERE DOES THE VARIATION COME FROM IN THE FIRST PLACE?

Gene regulation: From biophysics to evolutionary genetics

Introduction to Evolutionary Concepts

Name: Class: Date: ID: A

Phylogenetic inference

MiGA: The Microbial Genome Atlas

The Prokaryotic World

Chapter 26 Phylogeny and the Tree of Life

The Gene The gene; Genes Genes Allele;

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

1. In most cases, genes code for and it is that

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Lecture 10: Cyclins, cyclin kinases and cell division

Section 18-1 Finding Order in Diversity

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Phylogeny & Systematics

Lecture Notes: BIOL2007 Molecular Evolution

Assessing evolutionary relationships among microbes from whole-genome analysis Jonathan A Eisen

Example of Function Prediction

Subsystem: Succinate dehydrogenase

Comparative Genomics. Comparative Methods in Biology and. Introductory article

Honors Biology Reading Guide Chapter 11

The EcoCyc Database. January 25, de Nitrógeno, UNAM,Cuernavaca, A.P. 565-A, Morelos, 62100, Mexico;

Introduction to Molecular and Cell Biology

E. coli b4226 (ppa) and Mrub_0258 are orthologs; E. coli b2501 (ppk) and Mrub_1198 are orthologs. Brandon Wills

A thermophilic last universal ancestor inferred from its estimated amino acid composition

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

3/8/ Complex adaptations. 2. often a novel trait

Chapter 26 Phylogeny and the Tree of Life

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Biol478/ August

From gene to protein. Premedical biology

Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli

6.096 Algorithms for Computational Biology. Prof. Manolis Kellis

Correlation property of length sequences based on global structure of complete genome

Welcome to Class 21!

A tree kernel to analyze phylogenetic profiles

Self Similar (Scale Free, Power Law) Networks (I)

Transcription:

Bioinformatics and Evolutionary Genomics: Genome Evolution in terms of Gene Content 3/10/2014 1

Gene Content Evolution

What about HGT / genome sizes? Genome trees based on gene content: shared genes Haemophilus influenzae Species specific genes Escherichia coli Species specific genes

Genome trees based on gene content Presence / absence matrix: OG1 OG2 OG3 OG4 sp1 1 1 0 1 sp2 0 1 0 0 sp3 0 0 1 1 dist (spa, spb) = 1 ( # shared OGs (spa, spb) ) Size of the smallest genome d \s sp1 sp2 sp3 sp4 sp1 0 \1 0.2 0.4 0.2 sp2 0.8 0 \1 0.9 0.1 sp3 0.6 0.1 0 \1 0.3 sp4 0.8 0.9 0.7 0 \1 Neighbor joining

Genome trees based on gene content are remarkably similar to consensus on ToL M. tuberculosis M. pneumoniae M. genitalium C. pneumoniae 100 M. genitalium B. subtilis T. maritima T. pallidum Spirochaetales C. trachomatis 100 88 98 100 69 B. burgdorferi A. aeolicus Synechocystis sp. E. coli 100 93 89 100 100 100 97 100 100 A. pernix H. influenzae R. prowazekii 100 P. horikoshii H. pylori 26695 H. pylori J99 M. thermoautotrophicum M. jannaschii 0.1 Proteobacteria C. elegans S. cerevisiae A. fulgidus Euryarchaeota Eukarya

Presence / absence of genes Gene content co-evolution. (The easy case, few genomes. ) Differences between gene Content reflect differences in Phenotypic potentialities Genomes share genes for phenotypes they have in common

Huynen et al., 1998, FEBS Lett Three-way comparisons

Convergence in functional classes of gene content in small intracellular bacterial parasites Zomorodipour & Andersson FEBS Letters 1999

Although we can, qualitatively, interpret the variations in shared gene content in terms of the phenotypes of the species, quantitatively they depend on the relative phylogenetic positions of the species. The closer two species are the larger fraction of their genes they share. M. tuberculosis M. pneumoniae M. genitalium B. subtilis 100 T. maritima C. pneumoniae C. trachomatis Synechocystis sp. E. coli 100 100 H. influenzae R. prowazekii 100 100 H. pylori 26695 H. pylori J99 Proteobacteria 0.1 93 88 89 98 100 69 100 100 97 100 100 S. cerevisiae C. elegans Eukarya T. pallidum B. burgdorferi A. aeolicus P. horikoshii M. thermoautotrophicum M. jannaschii A. fulgidus Spirochaetales A. pernix Euryarchaeota

Co-evolving modules

Co-occurrence occurrence of genes across genomes as prediction for interaction / association i.e. two genes have the same presence/ absence pattern over multiple genomes: AKA phylogenetic profiles NB complete genomes absence -> needed for absence Correction for phylogenetic signal needed events b

Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence occurrence of genes across genomes / phylogenetic profiles Friedreich s ataxia No (homolog with) known function

Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence occurrence of genes across genomes Friedreich s ataxia No (homolog with) known function Frataxin has co-evolved with hsca and hscb indicating that it plays a role in iron-sulfur cluster assembly

Iron-Sulfur (2Fe-2S) cluster in the Rieske protein

Prediction: ~Confirmation:

If two genes have a significantly similar presence/absence pattern which is different from the phylogenetic signal, than their proteins are likely to interact / be in the same process/pathway This pattern can be created by independent loss and/or horizontal gene transfer (of operons)

However COG0707 COG0769 COG0770 COG0771 COG0773 COG0796 COG0812 COG1181 peptidoglycan biosynthesis pathway (highly cohesiveness, far from perfect) COG0021 COG0213 COG2820 ribose phosphate metabolism (not cohesive at all) Very few functional modules are perfect; limited cohesiveness; functional units vs evolutionary units Why?

If the phyletic patterns of two proteins are highly similar they tend to interact, but the reverse is not generally true! ~50% of modules (such as protein complexes) do not have highly similar phyletic patterns. This seems to not only depend on dataset, noise in orthology detection, or noise in module definition Fokkens and Snel PLoS Comp Biol 2009

Explaining discordant phyletic patterns of proteins that interact Many cases stories and a few large scale studies (we could also just say that evolution is flexible and proteins change function; which I am not going to argue with but (A) conservation of interaction and (B) this is a just so, non testable explanation )

Tracing the evolution of NADH:ubiquinone oxidoreductase (Complex I of the oxidative phosphorylation), from 14 subunits (Bacteria) to 46 subunits (Mammals) by comparative genome analysis Fungi: 37 Bacteria: 14 subunits Mammals: 46 Plants: 30 Algae: 30 Gabaldon et al, J. Mol. Biol 2005

Reconstructing Complex I evolution by mapping the variation onto a phylogenetic tree. After an initial surge in complexity (from 14 to 35 subunits in early eukaryotic evolution) new subunits have been gradually added and incidentally lost. most other loss is large scale

In the eukaryotic evolution of Complex I, new subunits have been added all over the complex Gabaldon et al, J. Mol. Biol 2005

Reconstructing Complex I evolution by mapping the variation onto a phylogenetic tree. After an initial surge in complexity (from 14 to 35 subunits in early eukaryotic evolution) new subunits have been gradually added and incidentally lost., most other loss is large scale Complex I loss is not always complete, S.cerevisiae and S.pombe have retained 1 and 3 proteins

Phylogeny of a remaining complex I protein in pombe The Complex I assembly protein CI30 has been duplicated in the Fungi. This can explain the presence of a CIA30-homolog in Complex I-less S.pombe? Gabaldon et al, J. Mol. Biol 2005

Why the accumulation: a neutral explanation fixation of neutral or slightly deleterious features as a general and unavoidable source of complexity in taxa with small populations Science. 2010 Nov 12;330(6006):920-1. Cell biology. Irremediable complexity? Gray MW, Lukes J, Archibald JM, Keeling PJ, Doolittle WF. How to falsify?

e.g. Neurospora mito-tyrrs Neurospora mitochondrial genome encodes several introns which require a tyrosyl trna synthetase (TyrRS) to splice. to compensate for structural defects acquired by the intron sequences BUT Introns with defects arising -> negative selection? Reverse: first binding (fortuitously or for reason unrelated to splicing) > accumulation of mutations in the intron that inactivate splicing, if TyrRS not bound. Because the compensatory / suppressive activity exists before mutation presuppression, the protein dependence by the intron could be selectively neutral (or slightly disadvantageous

Constructive neutral evolution Suggested that many taxon specific subunits (taxon specifc proteins that are a subunit in a complex) are regulatory subunits Hypothesis: neutrally added but necessary subunits could have been appropriated as regulatory subunits?

TOR1 complex Kinase Regulates growth Mutations of TOR1 components involved in Cancer TOR Raptor LST8

Evolution of TOR TOR Raptor LST8

Does evolution of TOR make more sense if we consider the whole network of interactions: TOR2 complex TOR2 is involved in rearrangement of cytoskeleton TOR Raptor LST8 TOR Rictor LST8

TOR Rictor Raptor LST8 Raptor or Rictor

? Taxonomic subunit constructive neutral evolution?

Considering triangles of interactions instead of pairs: a complementarity score

A substantial fraction of the others has a high complementarity score

Gene duplications are important in evolution and function of TOR complexes TOR1 Rictor LST8 TOR2 Raptor Gene duplication of the tor kinase in Sacceromyces cerevisiae, chytrid fungi, oomycetes, poplar. In yeast molecular biology demonstrated specialization Why in some species this happens and others not?

Other forms of co-evolution Speed/rate of evolution Acceleration after loss of binding partner Compensatory mutations, co-evolving residues: old problem, never solved, now maybe possibly in reach thanks to evfold (?), not applied to study evolution yet

Non-orthologous orthologous gene displacement/analogous proteins First systematic analysis on M.genitalium (Koonin et al., Trends Genet. 1997)

The opposite of co-occurrence: occurrence: anti-correlation / complementary patterns: predicting analogous enzymes Genes with complementary phylogenetic profiles could have a similar biochemical function. A B A B

Complementary patterns in thiamin biosynthesis predict analogous enzymes

Prediction of analogous enzymes is confirmed

Asymmetric functional/metabolic relations explain non-similar presence absence patterns Asymmetric relationships between proteins shape genome evolution. Notebaart RA, Kensche PR, Huynen MA, Dutilh BE. Genome Biol. 2009 Feb 12;10(2):R19.

Domains vs proteins? TSC1 TSC2 interaction domain of TSC2? But why the innovation? Regulatory subunit / neutral accumulation of taxon specific subunits / constructive neutral evolution?

Explaining the evolution of genes stimulates a better / more focused discussion on what we mean by gene function(al) relationship The more/better HTP functional data, the better for studying genome evolution Many different plausible, interlocking, reasons for disrupted co-occurrence across genomes of interacting proteins; (role of duplication least systematically researched?)