Microbial Taxonomy and Phylogeny: Extending from rrnas to Genomes

Similar documents
Stepping stones towards a new electronic prokaryotic taxonomy. The ultimate goal in taxonomy. Pragmatic towards diagnostics

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

MiGA: The Microbial Genome Atlas

Microbial Taxonomy and the Evolution of Diversity

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Chapter 19. Microbial Taxonomy

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial ocean metagenomics : the central dogma meets systems ecology. Ed DeLong - October 4, San Diego

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Microbial Taxonomy. C. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbiome: 16S rrna Sequencing 3/30/2018

Introduction to polyphasic taxonomy

Outline. I. Methods. II. Preliminary Results. A. Phylogeny Methods B. Whole Genome Methods C. Horizontal Gene Transfer

Interpreting the Molecular Tree of Life: What Happened in Early Evolution? Norm Pace MCD Biology University of Colorado-Boulder

Chapter 26 Phylogeny and the Tree of Life

Introduction to the SNP/ND concept - Phylogeny on WGS data

SUPPLEMENTARY INFORMATION

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University

This is a repository copy of Microbiology: Mind the gaps in cellular evolution.

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Exploring Microbes in the Sea. Alma Parada Postdoctoral Scholar Stanford University

Microbiology Helmut Pospiech

Introductory Microbiology Dr. Hala Al Daghistani

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

Bergey s Manual Classification Scheme. Vertical inheritance and evolutionary mechanisms

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Comparative genomics: Overview & Tools + MUMmer algorithm

The Evolution of DNA Uptake Sequences in Neisseria Genus from Chromobacteriumviolaceum. Cory Garnett. Introduction

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Taxonomical Classification using:

Outline. Classification of Living Things

SUPPLEMENTARY INFORMATION

Computational methods for predicting protein-protein interactions

Taxonomy and Clustering of SSU rrna Tags. Susan Huse Josephine Bay Paul Center August 5, 2013

8/23/2014. Phylogeny and the Tree of Life

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

PHYLOGENY AND SYSTEMATICS

Classification and Phylogeny

Using Bioinformatics to Study Evolutionary Relationships Instructions

Classification and Phylogeny

Ch 10. Classification of Microorganisms

2 Genome evolution: gene fusion versus gene fission

Honor pledge: I have neither given nor received unauthorized aid on this test. Name :

Dr. Amira A. AL-Hosary

Rapid Learning Center Chemistry :: Biology :: Physics :: Math

Supplementary Information

DNA reassociation kinetics and diversity indices: richness is not rich enough

Plant Names and Classification

Mitochondrial Genome Annotation

Curriculum Links. AQA GCE Biology. AS level

Bacillus anthracis. Last Lecture: 1. Introduction 2. History 3. Koch s Postulates. 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics Chapter 1. Introduction

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Bioinformatics. Dept. of Computational Biology & Bioinformatics

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together

RGP finder: prediction of Genomic Islands

Introduction to Microbiology. CLS 212: Medical Microbiology Miss Zeina Alkudmani

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Chapter 19: Taxonomy, Systematics, and Phylogeny

Biology 112 Practice Midterm Questions

Microbiology / Active Lecture Questions Chapter 10 Classification of Microorganisms 1 Chapter 10 Classification of Microorganisms

Comparing Prokaryotic and Eukaryotic Cells

Chapter 17. Table of Contents. Objectives. Taxonomy. Classifying Organisms. Section 1 Biodiversity. Section 2 Systematics

Lecture 11 Friday, October 21, 2011

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Introduction to Evolutionary Concepts

Pipelining RDP Data to the Taxomatic Background Accomplishments vs objectives

A Novel Ribosomal-based Method for Studying the Microbial Ecology of Environmental Engineering Systems

Quantifying sequence similarity

Centrifuge: rapid and sensitive classification of metagenomic sequences

Robert Edgar. Independent scientist

Prereq: Concurrent 3 CH

Evolution AP Biology

Bio 119 Bacterial Genomics 6/26/10

Probing diversity in a hidden world: applications of NGS in microbial ecology

Censusing the Sea in the 21 st Century

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Phylogeny & Systematics

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Ecology of Infectious Disease

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

BIOINFORMATICS: An Introduction

Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rrna Gene Sequence Analysis

Istituto di Microbiologia. Università Cattolica del Sacro Cuore, Roma. Gut Microbiota assessment and the Meta-HIT program.

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Microbial analysis with STAMP

Multiple Sequence Alignment. Sequences

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Supporting Information

Biology Assessment. Eligible Texas Essential Knowledge and Skills

Name: Class: Date: ID: A

STAAR Biology Assessment

BLAST. Varieties of BLAST

Transcription:

Microbial Taxonomy and Phylogeny: Extending from rrnas to Genomes Dr. Kostas Konstantinidis Department of Civil and Environmental Engineering & Department of Biology (Adjunct), Center for Bioinformatics and Computational Genomics Georgia Institute of Technology ICCC12 Conference Florianópolis, Brazil 2010

Outline Translating old standards to sequence: The ANI approach New insights into the species issue from natural populations Assessing the whole prokaryotic diversity & taxonomy Conclusions & Perspectives

How are species/units defined? The most popular species definition a genomically coherent (discreet) group of strains based on the hybridization of their purified DNA molecules Plus, a diagnostic phenotype Wayne et al. IJSB, 1987

The DNA-DNA hybridization method DDH general principle Isolate genomic DNA from strains A and B Random fragmentation Denature DNA Mix and let renature Quantify heteroduplex relative to homoduplex >70% => SAME species Good correspondence with phenotypically coherent clusters of strains in Enterobacteriaceae BUT + +? Difficult to do! Unclear how it relates to whole-genome relatedness. Need to have isolates available but only 1-2% of prokaryotic cells are cultivable! (the great plate count anomaly)

DDH vs. whole-genome sequence relatedness DDH vs. whole-genome Average Nucleotide Identity (ANI) 70% DDH <=> 95% ANI! (ANI) Goris, Konstantinidis, et al. IJSEM, 2007

ANI to measure relatedness Reference genome (CDS sequences) Tester genome (genomic sequence) Query Reciprocal (best match) BLASTP/N IF 30% Id. & 70% Len. of the query ORF (a.a. OR nt. level) Database Conserved Genes Conserved genes as percent of the total ORFs (normalize genome size effect) Average Nucleotide Identity of the conserved genes (measure evolutionary distance)

ANI vs. Maximum Likelihood (ML) Within A Group Determine the conserved gene core based on RBM blast searches. Build a clustalw concatenated alignment of all core genes (>2,000). Determine the best model for sequence evolution by ModelTest. Build core-based ML phylogeny and calculate the distances between the genomes of the group. Compare ML to ANI (calculate on the same, core genes) distances. Konstantinidis, Ramette, &Tiedje, Applied & Environ. Micro. 2006. Highly robust, Simple and universal, Resolution within species

MLSA vs. 3-best genesin the genome Classical MLST: Significantly different by KH test (p-value 0.02) 3 Best-genes: NOT significantly different by KH test (p-value 0.709) Resolution at the strain level with (just) three genes! Konstantinidis, Ramette, &Tiedje, Applied & Environ. Micro. 2006.

How are species/units defined? The most popular species definition a genomically coherent (discreet) group of strains based on the hybridization of their purified DNA molecules Plus, a diagnostic phenotype Wayne et al. IJSB, 1987

Genetic continuum OR discreet clusters? Whole-genome Max. Lik. phylogeny (based on 2,000 core genes) E. coli Year Year 2000 2000 Same Same picture picture with with ANI ANI

Genetic continuum OR discreet clusters? Whole-genome Max. Lik. phylogeny (based on 2,000 core genes) E. coli Year 2002 Same picture with ANI

Genetic continuum OR discreet clusters? E. coli Year 2010 Same picture with ANI Where did the strains come from? How ecologically successful there? Are intermediates viable? Share same ecology/phenotype? Luo, Konstantinidis, et al., submitted

Outline Translating old standards to sequence: The ANI approach New insights into the species issue from natural populations Assessing the whole prokaryotic diversity & taxonomy Conclusions & Perspectives

How to study natural populations?

What is metagenomics? the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species Handelsman et al. Chemistry Biology, 1998

Metagenomic sampling of the Oceans Global Ocean Survey (GOS) sampling sites Hawaii Ocean Time Series ~200Mbp of shotgun library Deep, 4000m depth (Konstantinidis & DeLong, ISME 2008) SAR3, SAR4, GOS18, GOS23, GOS34 ~200-300Mbp of shotgun library Surface, ~5m depth (Rusch et al, PLoS Biology 2007)

The approach From Konstantinidis, 2010 In: Handbook of Molecular Microbial Ecology. Volume I Metagenomics and Complementary Approaches. Editor: Frans J. de Bruijn

Coverage Clusters are ubiquitous in the Oceans! BAC clone from uncultured proteobacterium vs. Surface shotgun Based on a phylogenetic approach too! Sequence discontinuity Nucleotide Identity Synechococcus vs. Coastal Atlantic (mostly) sequencing errors

Species-like populations!? Sequence-discrete populations Smaller intra-population gene-content differences compared to several named species (<5% vs 20-30%) Detectable intra-genomic homologous recombination; albeit lower levels compared to biofilm communities (e.g., AMD) Genetic discontinuity frequently @ 95% ANI! More details: Konstantinidis &DeLong, The ISME Journal, 2008

What about populations from different environments or sites?

GroupI Crenarchaea in the Oceans Archaeal dominance in the mesopelagic zone of the Pacific Ocean Constituting up to 20% of microbial cells in the sub-photic zone Hallam Konstantinidis et al. PNAS 2006 Hallam et al. Plos Biol. 2006 Wuchter et al. PNAS. 2006 Konneke et al. Nature, 2005

How are they distributed with depth? Nucleotide identity against 4,000m crenarchaeal population DeLong et. al. Science 2006 Konstantinidis et al., Applied & Environmental Microbiology 2009 Med. 3,000m Do they have the same metabolism?

Outline Translating old standards to sequence: The ANI approach New insights into the species issue from natural populations Assessing the whole prokaryotic diversity & taxonomy Conclusions & Perspectives

The current taxonomic system Prokaryotic Taxonomy: Classification, Identification, Nomenclature There is no official prokaryotic taxonomy. Bergey s is the closest approximation to this. 8 recognized taxonomic ranks. Domain Phylum Class Order Family Genus Species Subspecies Current classification is primarily based on 16S rrna and secondarily on classical microscopic and biochemical observations. How well does 16S rrna represent the whole genome phylogeny?

16S rrnavs whole-genome relatedness From Konstantinidis and Tiedje, Current Opinion in Microbiology, 2007

Zooming in at the species level >70% DDH =>>95% ANI =>>98.5% 16S rrna

16S rrnavs gene-content relatedness Genetic distances correspond to (even greater) gene content differences!! From Konstantinidis and Tiedje, Current Opinion in Microbiology, 2007

The current taxonomic system Prokaryotic Taxonomy: Classification, Identification, Nomenclature There is no official prokaryotic taxonomy. Bergey s is the closest approximation to this. 8 recognized taxonomic ranks. Domain Phylum Class Order Family Genus Species Subspecies Current classification is primarily based on 16S rrna and secondarily on classical microscopic and biochemical observations. Designation of higher than the species ranks is rather arbitrary (clustering by 16S rrna but no standards on absolute differences).

16S rrna identity Identifying outliers of taxonomy 176X176 = 30,976 pairs Adjacent ranks show ~30% overlap Non-adjacent ranks overlap ~10 fold less frequently Average A.A. Identity From Konstantinidis and Tiedje, Journal of Bacteriology, 2005

Summary-Conclusions Translate old standards to portable sequence data Sequence-distinct populations frequently @ 95% ANI level! 16S rrna gene is reliable Genome-taxonomy for a more predictive classification system 95% ANI <=> 98.5% 16S Curr. Opin. Microbiol. 2007

Environmental Microbial Genomics Lab @ GaTech Alejandro Evolution/Bioinformatics Alex Bioinformatics Natasha Microbial Ecology Seung-Dae Env. Engineering Despoina Env. Engineering Rachel Poretsky Post-doc Tate Env. Engineering kostas@ce.gatech.edu www.enve-omics.gatech.edu Natural & Engineered microbial systems (Bioremediation)

Microbial & interdisciplinary research @ GaTech Environmental Engineering Building Downtown Atlanta Interested? Email kostas@ce.gatech.edu

Acknowledgments People Kostas group @ Georgia Tech AlejanderLuo, Bioinformatics Alejandro Caro, Microbial Ecology/Bioinformatics Natasha DeLeong, Microbial Ecology Seung-Dae Oh, Env. Engineering DespoinaTsementzi, Env. Engineering Tate Nixon, Env. Engineering Support Genomes to Life Program Award #DE-FG02-07ER64389 DeLong s group @ M.I.T. Prof. Steven Hallam (now @ UBC) Dr. Virginia Rich (now @ U of Arizona) Dr. Gene Tyson (now @ U of Queensland) National Science Foundation IOS-0919251 & CBET-0967130 Other collaborators Prof. Frank Loeffler (GaTech), on bioremediation communities Prof. Spyros Pavlostathis (Gatech), antibiotic resistance Prof Hang Lu (GaTech), on single-cell genomics Prof. James Tiedje (MSU), on the species concept Dr. Alban Ramette (Max Planck, Bremen), on biogeography

Phylogenetic relationships based on AAI (Phylip) Neighbor joining tree of the full 176X176 matrix of AAI. Same results by weighted NG (Weighbor). Acinetob. A.pernix A.tumefaciens 175 genomes Acinetobacter sp. 1.00 Aeropyrum pernix 0.95 1.00 Agrobacterium tumefaciens 0.66 0.88 1.00 175 genomes total All phyla and several classes are as deep branching as Archaea!